I am deleting a large number of records from my CMS.
In fact, its not actually that large, it is about 250k nodes, but I am getting the same same timeouts and freezes that others have found when trying to work with this volume of data.
Clicking on "Empty recycle bin" is a non starter. It just freezes for... forever, giving no progress feedback or anything.
I don't mind if something is going to take a long time, if I can see the progress, so I created a custom api controller that deletes nodes by id.
[Route("privateapi/content/{id}")]
[HttpDelete]
public ActionResult Delete(int id)
{
var existingContent = contentService.GetById(id);
if(existingContent != null)
{
contentService.Delete(existingContent);
}
return Ok();
}
And what I'm doing is: from another app, I'm querying all the content in the recycle bin and calling this end point one id at a time.
This might seem extreme, but at least I can see what's happening now.
And what I am seeing is it takes between 200ms and 1000ms to delete a single node.
I'm running this in release mode on a 12th generation i9 with 64Gb of ram and there is nothing else on the machine.
Umbraco 10.0.0-rc5
That's insanely slow as far as I am concerned. I would expect to be able to delete all 250k nodes in that amount of time on this machine, if i went straight to the database... never mind just 1 record.
Some basic math leads me to think that deleting these records is going to take 34 hours!!
At this stage I have had many performance issues due to the amount of nodes, which again, is not enormous. We are due to go live next month and I have lost all confidence that Umbraco will be able to cope in the real world with this much content.
We are yet to migrate our existing users onto the new Umbraco version of the site (of which there are over 100k) and I am very worried about what is going to happen when Umbraco has to deal with that sort of number of members, alongside >250k content nodes. I have no confidence that it will go well.
Dont get me wrong, I actually have enjoyed working with Umbraco. I think its great. But from my experience, I don't think it is fit for purpose for sites with this amount of content.
But anyway.....
Back to the question at hand. The only clue that anything is actually going wrong is this is logged every time a node is deleted:
Exception thrown: 'Lucene.Net.QueryParsers.Classic.QueryParser.LookaheadSuccess' in Lucene.Net.QueryParser.dll
Exception thrown: 'System.IO.IOException' in Lucene.Net.QueryParser.dll
Exception thrown: 'System.IO.IOException' in Lucene.Net.QueryParser.dll
I am not sure how to debug these exceptions. All the information I have about them is in the 3 lines in the logs above.
After reading lots of posts on here, I am not hopeful for an answer, but I thought id ask:
Is there anything I can do to speed up this deletion of nodes?
Could the Lucene Exception be something that is causing this performance issue? And if so what can I do about them?
Just checked and the same exceptions are thrown when you delete using the UI.
And while I was watching the output, I can see this all in the logs from just clicking around the recycling bin, looking for something to delete
Exception thrown: 'System.IO.IOException' in System.Net.Sockets.dll
Exception thrown: 'System.IO.IOException' in System.Net.Security.dll
Exception thrown: 'System.IO.IOException' in System.Private.CoreLib.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'System.FormatException' in System.Private.CoreLib.dll
Exception thrown: 'System.ArgumentException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'System.FormatException' in System.Private.CoreLib.dll
Exception thrown: 'System.ArgumentException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'System.FormatException' in System.Private.CoreLib.dll
Exception thrown: 'System.ArgumentException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'System.FormatException' in System.Private.CoreLib.dll
Exception thrown: 'System.ArgumentException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'System.FormatException' in System.Private.CoreLib.dll
Exception thrown: 'System.ArgumentException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonReaderException' in Newtonsoft.Json.dll
Exception thrown: 'System.FormatException' in System.Private.CoreLib.dll
Exception thrown: 'System.ArgumentException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Exception thrown: 'Newtonsoft.Json.JsonSerializationException' in Newtonsoft.Json.dll
Honestly, I think you are right to be concerned. There are many good things about Umbraco, but performance has never been excellent in my experience, particularly for large batch operations. 250,000 nodes seems like quite a lot to me for an Umbraco site, and if your requirements include doing things like deleting 250,000 records at once, I would suggest rethinking your approach prior to launching that site.
Depending on what those hundreds of thousands of nodes represent, and if they all need full CMS functionality, you may want to consider moving some of that data into custom database tables outside of Umbraco. Or simply switching to a CMS that is more optimized for large sites.
All those exceptions are not normal, and they could well be slowing things down for you, but unfortunately I don't know why you'd be getting those.
Your custom code can also be a major factor. Many inefficiencies that would otherwise be minor could be show-stoppers with hundreds of thousands of nodes. You may want to audit any custom code that would be running during backoffice operations for performance.
Just to clarify: No, we don't need to delete 250k records on a regular basis. We are still testing and I am in the process of migrating the old site into the new umbraco one and I am deleting a large potion of the data in order to change some document types and import again. So yes, this is probably not going to be an on-going issue for us.
"Many inefficiencies that would otherwise be minor could be
show-stoppers with hundreds of thousands of nodes"
I have found this to be the case a few times already with Umbraco and this time it is looking like the same thing again. I was very suspicious of these Lucene.Net exceptions, so I read a bit more. See this comment here:
it only rebuilds indexes once when they are not there. Otherwise will just incrementally add to the index when saving/publishing. The performance should be negligeable.
So each time I delete a record, the search indexes are being updated incrementally. All this logic is very deep into the code base, and the source code of Lucene.net looks like it was decompiled from java or something... so it is hard to debug and see what is actually happening, in terms of timings. So I decided to simply remove the default indexes and replace them with a custom index that does not remove deleted items from the index.
The exceptions go away and my response times go from a range of 200ms-1000ms to a range of 2ms to 20ms. Its up to 100 times faster now.
So although I accept the comment from the linked thread above that performing an incremental update to the index is not a huge performance hit on an individual basis. Like, an admin clearing 10 items from the recycle bin is no issue. But (from my findings here anyway) it doesn't work for cases where there are a lot of data is to be deleted.
There is one thing I didn't confirm, cause i need to get back to getting the site ready for release:
Does the recycle bin perform the index update after every individual item in the bin is deleted? I think it does from looking at the code.
And if it does, it might be a better idea to attempt to do it in bulk in some way? I dunno.
For now, I have a workaround that has helped me. I hope it can be of use to others.
Yet another performance question
Hi All,
I am deleting a large number of records from my CMS. In fact, its not actually that large, it is about 250k nodes, but I am getting the same same timeouts and freezes that others have found when trying to work with this volume of data.
Clicking on "Empty recycle bin" is a non starter. It just freezes for... forever, giving no progress feedback or anything.
I don't mind if something is going to take a long time, if I can see the progress, so I created a custom api controller that deletes nodes by id.
And what I'm doing is: from another app, I'm querying all the content in the recycle bin and calling this end point one id at a time.
This might seem extreme, but at least I can see what's happening now.
And what I am seeing is it takes between 200ms and 1000ms to delete a single node.
I'm running this in release mode on a 12th generation i9 with 64Gb of ram and there is nothing else on the machine. Umbraco 10.0.0-rc5
That's insanely slow as far as I am concerned. I would expect to be able to delete all 250k nodes in that amount of time on this machine, if i went straight to the database... never mind just 1 record.
Some basic math leads me to think that deleting these records is going to take 34 hours!!
At this stage I have had many performance issues due to the amount of nodes, which again, is not enormous. We are due to go live next month and I have lost all confidence that Umbraco will be able to cope in the real world with this much content.
We are yet to migrate our existing users onto the new Umbraco version of the site (of which there are over 100k) and I am very worried about what is going to happen when Umbraco has to deal with that sort of number of members, alongside >250k content nodes. I have no confidence that it will go well.
Dont get me wrong, I actually have enjoyed working with Umbraco. I think its great. But from my experience, I don't think it is fit for purpose for sites with this amount of content.
But anyway.....
Back to the question at hand. The only clue that anything is actually going wrong is this is logged every time a node is deleted:
I am not sure how to debug these exceptions. All the information I have about them is in the 3 lines in the logs above.
After reading lots of posts on here, I am not hopeful for an answer, but I thought id ask:
Is there anything I can do to speed up this deletion of nodes?
Could the Lucene Exception be something that is causing this performance issue? And if so what can I do about them?
Just checked and the same exceptions are thrown when you delete using the UI.
And while I was watching the output, I can see this all in the logs from just clicking around the recycling bin, looking for something to delete
Is this normal?
Honestly, I think you are right to be concerned. There are many good things about Umbraco, but performance has never been excellent in my experience, particularly for large batch operations. 250,000 nodes seems like quite a lot to me for an Umbraco site, and if your requirements include doing things like deleting 250,000 records at once, I would suggest rethinking your approach prior to launching that site.
Depending on what those hundreds of thousands of nodes represent, and if they all need full CMS functionality, you may want to consider moving some of that data into custom database tables outside of Umbraco. Or simply switching to a CMS that is more optimized for large sites.
All those exceptions are not normal, and they could well be slowing things down for you, but unfortunately I don't know why you'd be getting those.
Your custom code can also be a major factor. Many inefficiencies that would otherwise be minor could be show-stoppers with hundreds of thousands of nodes. You may want to audit any custom code that would be running during backoffice operations for performance.
Thanks for your reply David.
Just to clarify: No, we don't need to delete 250k records on a regular basis. We are still testing and I am in the process of migrating the old site into the new umbraco one and I am deleting a large potion of the data in order to change some document types and import again. So yes, this is probably not going to be an on-going issue for us.
I have found this to be the case a few times already with Umbraco and this time it is looking like the same thing again. I was very suspicious of these Lucene.Net exceptions, so I read a bit more. See this comment here:
https://our.umbraco.com/forum/getting-started/installing-umbraco/20999-Disable-Lucene-Examine#comment-138878
It says:
So each time I delete a record, the search indexes are being updated incrementally. All this logic is very deep into the code base, and the source code of Lucene.net looks like it was decompiled from java or something... so it is hard to debug and see what is actually happening, in terms of timings. So I decided to simply remove the default indexes and replace them with a custom index that does not remove deleted items from the index.
I copied and pasted
and replace this method to simply do nothing:
and used this extension method to replace the indexes themselves
The exceptions go away and my response times go from a range of 200ms-1000ms to a range of 2ms to 20ms. Its up to 100 times faster now.
So although I accept the comment from the linked thread above that performing an incremental update to the index is not a huge performance hit on an individual basis. Like, an admin clearing 10 items from the recycle bin is no issue. But (from my findings here anyway) it doesn't work for cases where there are a lot of data is to be deleted.
There is one thing I didn't confirm, cause i need to get back to getting the site ready for release:
Does the recycle bin perform the index update after every individual item in the bin is deleted? I think it does from looking at the code.
And if it does, it might be a better idea to attempt to do it in bulk in some way? I dunno.
For now, I have a workaround that has helped me. I hope it can be of use to others.
is working on a reply...