This is a bit dirty...but I need to mess with the database
Hi Guys
Currently I'm doing some investigation of Poor performance in the Umbraco backoffice for a client.
I got a copy of the live-datbase that I'm now having a look at. I have a suspicion about some heavy usage of unneccesary empty properties on around 5000 nodes could be the issue. But the thing is that if I try to remove the properties on the document types in the settings section it takes like forever and most times I get a timeout.
Therefore I would like to know if there is some SQL script that I could run against the database that will delete certain properties from a certain node. It does not have to be anything fancy at all. Just something I can execute and have the properties remove.
The Umbraco version is 4.7.1.1.
<disclaimer title="don't do this at home kids!"> This is purely for testing purposes - I would never touch the database of a live site or a development site without having a backup and people who read this and get inspired by the answers should always remember to make a backup and be 100% certain about what they're doing. </disclaimer>
Hows about something like this, where doctypealias is your doc type's alias, and propertyalias is the alias of the property you want to remove, this will remove all properties with that alias from all doc types with that alias, be careful! I've made it so you could run the bit in brackets first before running the delete. Will make more sense if you then makde that "select *" instead of "select cmsPropertyData.Id"
delete from cmsPropertyData where id in (select cmsPropertyData.id from cmsPropertyData inner join cmsDocument on cmsDocument.nodeid = contentNodeId inner join cmsPropertyType on cmsPropertyType.id = cmsPropertyData.propertytypeid inner join cmsContentType on cmsContentType.nodeid = cmsPropertyType.contentTypeId and cmsContentType.alias = 'doctypealias' and cmsPropertyType.Alias = 'propertyalias')
If you wanted to do a single node, where the nodeid is 12345:
delete from cmsPropertyData where id in (select cmsPropertyData.id from cmsPropertyData inner join cmsDocument on cmsDocument.nodeid = contentNodeId inner join cmsPropertyType on cmsPropertyType.id = cmsPropertyData.propertytypeid inner join cmsContentType on cmsContentType.nodeid = cmsPropertyType.contentTypeId where cmsContentType.nodeid = 12345 and cmsPropertyType.Alias = 'propertyalias')
I've updated my original post as the old SQL was broken and it could cause someone stumbling upon this post to do bad things to their db. This will delete the specified property from all revisions of that node, sounds like that is what you wanted if it's performance issues
Thanks Steve...but not sure if there is a typo somwhere or if there are more places I should replace something. Running the first script in the original post gives me this error
A big problem with the Umbraco DB is that it is not optimized. If you are running MS SQL DB, then there is big performance to gain if you add columns to some of the indexes.
If you have rights to "sys." on the SQL DB, then the following script can show you wich indexes that is missing on your SQL server.
SELECT migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure_pct ,QUOTENAME(db_name(mid.database_id)) AS [database] ,QUOTENAME(OBJECT_SCHEMA_NAME(mid.object_id, mid.database_id)) AS [schema] ,QUOTENAME(OBJECT_NAME(mid.object_id, mid.database_id)) AS [table] ,'CREATE INDEX [mi_' + SUBSTRING(CONVERT(VARCHAR(64), NEWID()), 1, 8) + ']' + ' ON ' + mid.statement + ' (' + ISNULL(mid.equality_columns, '') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL(mid.inequality_columns, '') + ')' + ISNULL(' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement ,migs.*,mid.database_id ,mid.[object_id] FROM sys.dm_db_missing_index_groups mig INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10 ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
You really deserve a ton of #h5yr for that previous post - I just tried running the script and I had about 15+ suggestions for indexing the tables.
After I ran all of them performance when saving documents in the backoffice is remarkably improved to take less than 6 seconds. Before indexing it took 18+ seconds.
Stille exploring more options though but this is definently way better already.
I forgot to meantion, that it is a good idea to look at the suggested new indexes. Some of them can be doubles.
Lets say that it suggests these two on same table:
....INCLUDE (NodeId)
....INCLUDE (NodeId, ReleaseDate)
Then you should only add the last index. If you add both, you will still get the performance when Umbraco makes SELECT queries. But when Umbraco makes INSERT, UPDATE, DELETE it has to maintain two indexes instead of one.
Actually, there where a talk about DB optimization on CG13. My goal is to make a pull-request with some of these missing indexes. But I have to have a site with lots of activity both on back-end and front-end. To make sure I get good statistic data.
Ok, so in this particular case, which one of the beneath suggestions should I exectute?
[cmsDocument] ([newest],[expireDate]) INCLUDE ([nodeId]) [cmsDocument] ([newest],[releaseDate]) INCLUDE ([nodeId]) [cmsDocument] ([published]) INCLUDE ([nodeId])
Once again, many thanks! It's really some important knowledge :)
I would execute them all. Because the keys are different.
If you had for example these:
[cmsDocument] ([published]) INCLUDE ([nodeId], [expireDate]) [cmsDocument] ([published]) INCLUDE ([nodeId], [releaseDate]) [cmsDocument] ([published]) INCLUDE ([nodeId])
That tells us that Umbraco is making different requests where [published] is part of WHERE and [nodeId], [expireDate], [releaseDate] is in SELECT (not at the same time).
So by making an index where we include the 3 columns. The SQL server only has to search the index to find the key [publised]. And normally it then has to find the record in the table to get one of the 3 coulmns. But when you make an index where you include columns, these coulms is available when the index is found. So the SQL Server only has to do 1 search in the index :-)
The 3 examples above would I make into 1:
[cmsDocument] ([published]) INCLUDE ([nodeId], [expireDate], [releaseDate])
You could include all the columns in the index, but that would performe bad and the index would be slow. So keep the include part small.
It makes sense, but I will probably have to read through it a couple of times to fully grasp what is going on :)
I'm also thinking that there should be some documentation/guidance regarding these possible performance gains. When to use, why etc. - But I suppose this was also discussed during one of the open space sessions at CG13?
The best part is the these new indexes you make does not break your Umbraco installation. It can only make it performe better. The DB size will increase, but who cares about that ;-)
So it should be easy for HQ to add some new indexes to the SQL script, sinces it does not break the backward-compatible.
Ok, so turns out that on the live-site we went from 22 seconds on save and publish to 4 seconds just by adding the indexes suggested by the first Script Mikkel mentioned on page 1 of this thread.
@kipusoep I have just been able to get the result from the script running on a very active site with lots of members.
I guess "cmsMember" has the default primary key and Indexes. But it was missing this one:
[cmsMember] ([nodeId]) INCLUDE ([Email], [LoginName], [Password])
The improvement was only 36 %, not so much performance to gain from that. The big performances are on the tables "umbracoNode", "cmsContent", "cmsDocument"
For anyone following this thread, it would be a great help if you could run Mikkel's script on a LIVE running v6 database (preferably with a lot of activity) and report back with the results, so we can analyze them and optimize the db schema.
Please post your replies in this thread on the Core dev google group with results attached:
This is a bit dirty...but I need to mess with the database
Hi Guys
Currently I'm doing some investigation of Poor performance in the Umbraco backoffice for a client.
I got a copy of the live-datbase that I'm now having a look at. I have a suspicion about some heavy usage of unneccesary empty properties on around 5000 nodes could be the issue. But the thing is that if I try to remove the properties on the document types in the settings section it takes like forever and most times I get a timeout.
Therefore I would like to know if there is some SQL script that I could run against the database that will delete certain properties from a certain node. It does not have to be anything fancy at all. Just something I can execute and have the properties remove.
The Umbraco version is 4.7.1.1.
<disclaimer title="don't do this at home kids!">
This is purely for testing purposes - I would never touch the database of a live site or a development site without having a backup and people who read this and get inspired by the answers should always remember to make a backup and be 100% certain about what they're doing.
</disclaimer>
Cheers,
Jan
Hows about something like this, where doctypealias is your doc type's alias, and propertyalias is the alias of the property you want to remove, this will remove all properties with that alias from all doc types with that alias, be careful! I've made it so you could run the bit in brackets first before running the delete. Will make more sense if you then makde that "select *" instead of "select cmsPropertyData.Id"
If you wanted to do a single node, where the nodeid is 12345:
Hi Steve
Thanks, I'll try and see what happens.
Let me get back to you.
Cheers,
Jan
Ok, just fired the first one...does not seem like it's hitting anything.
I just gives me "0 rows affected" when I run the script.
It's the first one I'm after - the second one wille be very tedious :)
What could be missing in the first statement?
/Jan
I've probably missed something, let me backup a db and have a play!
Steve
I highy appreciate it - thank you Steve :)
/Jan
I've updated my original post as the old SQL was broken and it could cause someone stumbling upon this post to do bad things to their db. This will delete the specified property from all revisions of that node, sounds like that is what you wanted if it's performance issues
Cheers,
Steve
Thanks Steve...but not sure if there is a typo somwhere or if there are more places I should replace something. Running the first script in the original post gives me this error
Msg 208, Level 16, State 1, Line 1
Invalid object name 'cmsPropertyData'.
There does not seem to be a typo in the above?
/Jan
Ah, just needed to add the db name to the above.
Works like a charm. Thanks a bunch Steve!
/Jan
Assuming you're using SQL Server management studio, have you selected the db in dropdown on the top left?
Steve
A big problem with the Umbraco DB is that it is not optimized. If you are running MS SQL DB, then there is big performance to gain if you add columns to some of the indexes.
If you have rights to "sys." on the SQL DB, then the following script can show you wich indexes that is missing on your SQL server.
Hi Mikkel
Thanks I will check this out when I have got some sleep.
/Jan
Hi Mikkel
You really deserve a ton of #h5yr for that previous post - I just tried running the script and I had about 15+ suggestions for indexing the tables.
After I ran all of them performance when saving documents in the backoffice is remarkably improved to take less than 6 seconds. Before indexing it took 18+ seconds.
Stille exploring more options though but this is definently way better already.
Thanks a bunch for the help guys.
/Jan
Hi Jan
I forgot to meantion, that it is a good idea to look at the suggested new indexes. Some of them can be doubles.
Lets say that it suggests these two on same table:
....INCLUDE (NodeId)
....INCLUDE (NodeId, ReleaseDate)
Then you should only add the last index. If you add both, you will still get the performance when Umbraco makes SELECT queries. But when Umbraco makes INSERT, UPDATE, DELETE it has to maintain two indexes instead of one.
Actually, there where a talk about DB optimization on CG13. My goal is to make a pull-request with some of these missing indexes. But I have to have a site with lots of activity both on back-end and front-end. To make sure I get good statistic data.
/Mikkel
Hi Mikkel
Ok, so in this particular case, which one of the beneath suggestions should I exectute?
[cmsDocument] ([newest],[expireDate]) INCLUDE ([nodeId])
[cmsDocument] ([newest],[releaseDate]) INCLUDE ([nodeId])
[cmsDocument] ([published]) INCLUDE ([nodeId])
Once again, many thanks! It's really some important knowledge :)
/Jan
I would execute them all. Because the keys are different.
If you had for example these:
[cmsDocument] ([published]) INCLUDE ([nodeId], [expireDate])
[cmsDocument] ([published]) INCLUDE ([nodeId], [releaseDate])
[cmsDocument] ([published]) INCLUDE ([nodeId])
That tells us that Umbraco is making different requests where [published] is part of WHERE and [nodeId], [expireDate], [releaseDate] is in SELECT (not at the same time).
So by making an index where we include the 3 columns. The SQL server only has to search the index to find the key [publised]. And normally it then has to find the record in the table to get one of the 3 coulmns. But when you make an index where you include columns, these coulms is available when the index is found. So the SQL Server only has to do 1 search in the index :-)
The 3 examples above would I make into 1:
[cmsDocument] ([published]) INCLUDE ([nodeId], [expireDate], [releaseDate])
You could include all the columns in the index, but that would performe bad and the index would be slow. So keep the include part small.
Does it make sense?
/Mikkel
Hi Mikkel
It makes sense, but I will probably have to read through it a couple of times to fully grasp what is going on :)
I'm also thinking that there should be some documentation/guidance regarding these possible performance gains. When to use, why etc. - But I suppose this was also discussed during one of the open space sessions at CG13?
/Jan
The best part is the these new indexes you make does not break your Umbraco installation. It can only make it performe better. The DB size will increase, but who cares about that ;-)
So it should be easy for HQ to add some new indexes to the SQL script, sinces it does not break the backward-compatible.
/Mikkel
Is there an issue for this in the issue tracker? #lazyweb ;-)
/Jan
Ok, so turns out that on the live-site we went from 22 seconds on save and publish to 4 seconds just by adding the indexes suggested by the first Script Mikkel mentioned on page 1 of this thread.
I expect we get a very happy client again :)
/Jan
I ran the SQL script on our DB server and found out it's not mentioning the 'cmsMember' table, which doesn't have any index or pk at all...
Are you using members on your site? It's looking at your usage so if you don't use members the tables probably wouldn't show up
Yes I've got multiple umbraco websites using the members section. Guess the SQL load is more cmsPropertyData related than cmsMember related.
But I guess no index/pk at all is quite bad, isn't it?
@kipusoep The script only shows the missing indexes where the improvement measure pct is greater than 10 %
So the reason why cmsMember is not showing any missing index, could be that the table is optimized.
/Mikkel
Mikkel, that sure is one nice piece of SQL.
Thanks a lot :-)
@kipusoep I have just been able to get the result from the script running on a very active site with lots of members.
I guess "cmsMember" has the default primary key and Indexes. But it was missing this one:
[cmsMember] ([nodeId]) INCLUDE ([Email], [LoginName], [Password])
The improvement was only 36 %, not so much performance to gain from that. The big performances are on the tables "umbracoNode", "cmsContent", "cmsDocument"
/Mikkel
For anyone following this thread, it would be a great help if you could run Mikkel's script on a LIVE running v6 database (preferably with a lot of activity) and report back with the results, so we can analyze them and optimize the db schema.
Please post your replies in this thread on the Core dev google group with results attached:
https://groups.google.com/forum/#!msg/umbraco-dev/0CCKQCeaVgw/Kwf6VUscxegJ
Thanks!
Morten Christensen
is working on a reply...