performance of bulk content updates with contentservice

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Keith Donnell 82 posts 187 karma points

Feb 06, 2015 @ 20:38

Keith Donnell

Performance of bulk Content updates with ContentService

I am working on a custom integration piece that is creating and/or updating thousands of content nodes on a scheduled basis, and was wondering if there are any tricks to make it more performant.

Here is a snippet of my current code:


            var uSvc = ApplicationContext.Services.ContentService;

                foreach (var model in models) // THOUSANDS OF RECORDS
                {
                    // Check if exists.  If not, create the node
                    var modelContent = parentNode.Children().FirstOrDefault(c => c.Name == model.Name);
                    if (modelContent == null)
                    {
                        modelContent = uSvc.CreateContent(model.Name, parentNode, "BoatModel");
                    }
                    // Update some fields
                    modelContent.SetValue("fieldAlias", model.Name);
                    // Save
                    uSvc.Save(modelContent);
               }

I have tried using the Save() override that accepts an IEnumerable, however that times out with large data sets, and seems to be just looping through the enumerable list (doesn't seem any more performant than calling the individual Save() in-place)

I have also tried instantiating a new thread for each iteration, but that results in SQL deadlocks (on the parentNode.Children() line).

Are there any other recommendations?

Alex Skrypnyk 6182 posts 24284 karma points MVP 9x admin c-trib

Feb 09, 2015 @ 14:07

0

Hi Keith,

Can you provide all code ?

Try to get nodes via IPublishedContent and save only ContentService.

Thanks

Copy Link
Keith Donnell 82 posts 187 karma points

Feb 09, 2015 @ 17:25

0

@Alex Unfortunately I cannot provide all code, however the code above is exactly what is taking HOURS to run.

Also, I don't think getting IPublishedContent nodes will be of much help, since every iteration will require a save (which as far as I know I will always need to get IContent and use ContentService to save).

Copy Link
Alex Skrypnyk 6182 posts 24284 karma points MVP 9x admin c-trib

Feb 09, 2015 @ 17:52

0

Keith, IPublishedContent is much faster than ICOntent, if you need only read data from node you can use it.

https://our.umbraco.org/forum/developers/api-questions/46631-Getting-Umbraco-Content-IPublishedContent-vs-IContent-vs-Node-vs-Document

IContent is managed by IContentService and is targetted at back-end, read-write access to content. It's what you want to use when you need to manage content.

IPublishedContent is managed by the "content cache" and is targetted at front-end, read-only access to content. It's what you want to user when rendering content.

Also I can't understand why do you iterate over 'models' collection and don't use it at all inside foreach. What is parentNode ?

Copy Link
Keith Donnell 82 posts 187 karma points

Feb 09, 2015 @ 19:32

0

Hello Alex,

Yes, I understand that IPublishedContent is faster, since that's the method to pull cached content, but I am saying that I need to update every node I am iterating, and to do that I need an IContent reference anyways. I could surely pull the content faster with IPublishedContent, but I'm not sure how that helps in my scenario.

"models" is a collection/list of objects I am retrieving from an external source. I need to take this data and update (or insert if it doesn't exist) the associated content node (i.e. "modelContent.SetValue("fieldAlias", model.Name);"). Right now this process takes MANY HOURS to complete and I am hoping there is a bulk insert/update alternative that I have overlooked.

Copy Link
Alex Skrypnyk 6182 posts 24284 karma points MVP 9x admin c-trib

Feb 10, 2015 @ 02:00

0

Dear Keith, it's hard task and I can imagine how much time you spent on it ) But we have few suggestions:

1) First of all don't call parentNode.Children() in the foreach.

2) Try to use .Save(List nodes), collect new nodes in foreach and save lists in one calling Save method

Thanks

Copy Link
Keith Donnell 82 posts 187 karma points

Feb 10, 2015 @ 16:01

0

Hello Alex,

1. That is a good point. I will see if I can get away with using IPublishedContent here instead, however some of this content isn't actually published yet - won't I end up with duplicates potentially?

2. According to the source code, it simply loops through each IContent in the collection and performs the same tasks that the simple Save(IContent) method performs, resulting in the same amount of round-trips (and therefore gains no performance). See https://github.com/umbraco/Umbraco-CMS/blob/dev-v7/src/Umbraco.Core/Services/ContentService.cs#L915 . Also, as I mentioned above, I did try that method and when you have more than a few hundred IContent objects in your collection, I was getting strange timeout issues - at least if I do the individual save, I can tell exactly which record timed out and can code in some sort of "Continue where we left off" functionality.

Thank you for your help, but it seems like there isn't much room for improvement from what I already have :(

Copy Link
David Sheiles 69 posts 339 karma points

Mar 05, 2015 @ 05:05

0

Hi Keith,

Did you work out any improvements in your code?

I'm doing the exact same task as yourself and I'm getting random out of memory issues.

Are you experiancing the same problem or simply just timeout errors?

Copy Link
Richard Soeteman 4054 posts 12927 karma points MVP 3x

Mar 05, 2015 @ 08:10

0

For CMSImport V3 http://soetemansoftware.nl/cmsimport I had the same issue. What I did was first check if a property is changed and only then I save the item otherwise not.

Maybe you can implement something similar?

Best,

Richard

Copy Link
David Sheiles 69 posts 339 karma points

Mar 05, 2015 @ 11:34

0

Thanks Richard,

Yes, I've started to go down that same path and it's working much better. Using IPublishedContent to query and check the properties and only load the IContent if there is a change. Seems to be working much better, but I can see that I might run into duplicates if a node is unpublished, which could be a problem as my script will run weeklyover alot of nodes. I'll just have to see how it goes.

Cheers,

Dave

Copy Link
Sören Deger 733 posts 2844 karma points c-trib

Mar 05, 2015 @ 11:55

0

Hi David,

I do not know if this could be a solution, but simply a spontaneous idea:

You can use the ContentService.Unpublished event for writing the unpublished nodeIds in a custom table or a cache file. Conversely, of course, you can use the ContentService.Published event for remove the nodeIds from the custom table or cache file. So you can proceed as described with IPublishedContent. At the end of your weekly script you can read the unpublished nodes based of your custom table or cache file and use ContentService only for the unpublished nodes.

Best,

Sören

Copy Link
David Sheiles 69 posts 339 karma points

Mar 05, 2015 @ 12:29

0

Thanks Sören,

That may be a good idea. A spontaneous addition to that is storing all ID's (either published / unpublished) and store a hash of the combination of properties. That way I can do a quick lookup to see if it's changed.

Not sure if creating the hash for each node will be too slow to be worth it.

If I run into problems I might give it a go.

Copy Link
Keith Donnell 82 posts 187 karma points

Mar 05, 2015 @ 16:12

0

@David I didn't run into any memory issues on my end, but feel free to take a look at my forked version of the repo on Github - I was able to improve bulk insert performance about 6-fold.

Copy Link
David Sheiles 69 posts 339 karma points

Mar 06, 2015 @ 03:08

0

Thanks Kieth... Sorry if this is a silly question (I'm a noob with github), how do I find your forked code? I'd be really interested in seeing what you've done.

Copy Link
Keith Donnell 82 posts 187 karma points

Mar 06, 2015 @ 16:12

0

@David I apologize. My forked code was for performance updates to the Merchello plugin - I didn't dare touch the Umbraco base. For some reason I thought this was the Merchello topic :/

Copy Link
David Sheiles 69 posts 339 karma points

Oct 20, 2016 @ 12:54
102
Hey Everyone, I know this is an old topic now, but I stumbled across the thread and thought I'd share how my project is going after a year in production. Hope to help others out before going down the same path.
1. Lesson Learnt #1 - Don't try to store a large set of data as content nodes especially if they are being updated frequently
2. Lesson Learnt #2 - Frequent updates will blow out the database as the rollback feature will grow dramatically. Matt Brailsford's UnVersion plugin saved me here!
3. To speed things up, I hook into the saved, deleted & trashed events and then update/delete a custom database table that holds the Content ID, A Unique identifier to match the external imported data, a calculated hash of all the editable fields, and a last modified date.
4. When the weekly script runs to make the updates, I calculate a hash of the new data and compare it with the hash stored in the database
5. If Changed, I go ahead and load the content node, save changes (which fires the saved event and updates the hash in the DB)
This doesn't help with speed if you know that you need to update every node, but it does help when you have a smaller subset to update.

My project is still running well, but the import is still slow when there are a lot of updates.

So to sum up:
Don't use content nodes for large amounts of data thats regularly updated. You are best to keep the data in a separate data table and manage displaying the data via custom routes.
Copy Link
Alex Skrypnyk 6182 posts 24284 karma points MVP 9x admin c-trib

Oct 20, 2016 @ 16:02

0

Thanks David for sharing, great lessons.

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies