Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Keith Donnell 82 posts 187 karma points
    Feb 06, 2015 @ 20:38
    Keith Donnell
    0

    Performance of bulk Content updates with ContentService

    I am working on a custom integration piece that is creating and/or updating thousands of content nodes on a scheduled basis, and was wondering if there are any tricks to make it more performant.

    Here is a snippet of my current code:


    var uSvc = ApplicationContext.Services.ContentService;

    foreach (var model in models) // THOUSANDS OF RECORDS
    {
    // Check if exists. If not, create the node
    var modelContent = parentNode.Children().FirstOrDefault(c => c.Name == model.Name);
    if (modelContent == null)
    {
    modelContent = uSvc.CreateContent(model.Name, parentNode, "BoatModel");
    }
    // Update some fields
    modelContent.SetValue("fieldAlias", model.Name);
    // Save
    uSvc.Save(modelContent);
    }

    I have tried using the Save() override that accepts an IEnumerable, however that times out with large data sets, and seems to be just looping through the enumerable list (doesn't seem any more performant than calling the individual Save() in-place)

    I have also tried instantiating a new thread for each iteration, but that results in SQL deadlocks (on the parentNode.Children() line).

    Are there any other recommendations?

  • Alex Skrypnyk 6148 posts 24097 karma points MVP 8x admin c-trib
    Feb 09, 2015 @ 14:07
    Alex Skrypnyk
    0

    Hi Keith,

    Can you provide all code ?

    Try to get nodes via IPublishedContent and save only ContentService.

    Thanks

  • Keith Donnell 82 posts 187 karma points
    Feb 09, 2015 @ 17:25
    Keith Donnell
    0

    @Alex Unfortunately I cannot provide all code, however the code above is exactly what is taking HOURS to run.

    Also, I don't think getting IPublishedContent nodes will be of much help, since every iteration will require a save (which as far as I know I will always need to get IContent and use ContentService to save).

  • Alex Skrypnyk 6148 posts 24097 karma points MVP 8x admin c-trib
    Feb 09, 2015 @ 17:52
    Alex Skrypnyk
    0

    Keith, IPublishedContent is much faster than ICOntent, if you need only read data from node you can use it.

    https://our.umbraco.org/forum/developers/api-questions/46631-Getting-Umbraco-Content-IPublishedContent-vs-IContent-vs-Node-vs-Document

    IContent is managed by IContentService and is targetted at back-end, read-write access to content. It's what you want to use when you need to manage content.

    IPublishedContent is managed by the "content cache" and is targetted at front-end, read-only access to content. It's what you want to user when rendering content.

    Also I can't understand why do you iterate over 'models' collection and don't use it at all inside foreach. What is parentNode ?

  • Keith Donnell 82 posts 187 karma points
    Feb 09, 2015 @ 19:32
    Keith Donnell
    0

    Hello Alex,

    Yes, I understand that IPublishedContent is faster, since that's the method to pull cached content, but I am saying that I need to update every node I am iterating, and to do that I need an IContent reference anyways.  I could surely pull the content faster with IPublishedContent, but I'm not sure how that helps in my scenario.

    "models" is a collection/list of objects I am retrieving from an external source.  I need to take this data and update (or insert if it doesn't exist) the associated content node (i.e. "modelContent.SetValue("fieldAlias", model.Name);").  Right now this process takes MANY HOURS to complete and I am hoping there is a bulk insert/update alternative that I have overlooked.

  • Alex Skrypnyk 6148 posts 24097 karma points MVP 8x admin c-trib
    Feb 10, 2015 @ 02:00
    Alex Skrypnyk
    0

    Dear Keith, it's hard task and I can imagine how much time you spent on it ) But we have few suggestions:

    1) First of all don't call parentNode.Children() in the foreach.

    2) Try to use .Save(List nodes), collect new nodes in foreach and save lists in one calling Save method

     Thanks

  • Keith Donnell 82 posts 187 karma points
    Feb 10, 2015 @ 16:01
    Keith Donnell
    0

    Hello Alex,

    1.  That is a good point.  I will see if I can get away with using IPublishedContent here instead, however some of this content isn't actually published yet - won't I end up with duplicates potentially?

    2.  According to the source code, it simply loops through each IContent in the collection and performs the same tasks that the simple Save(IContent) method performs, resulting in the same amount of round-trips (and therefore gains no performance).  See https://github.com/umbraco/Umbraco-CMS/blob/dev-v7/src/Umbraco.Core/Services/ContentService.cs#L915 .  Also, as I mentioned above, I did try that method and when you have more than a few hundred IContent objects in your collection, I was getting strange timeout issues - at least if I do the individual save, I can tell exactly which record timed out and can code in some sort of "Continue where we left off" functionality.

    Thank you for your help, but it seems like there isn't much room for improvement from what I already have :(

  • David Sheiles 67 posts 337 karma points
    Mar 05, 2015 @ 05:05
    David Sheiles
    0

    Hi Keith,

    Did you work out any improvements in your code?

    I'm doing the exact same task as yourself and I'm getting random out of memory issues.

    Are you experiancing the same problem or simply just timeout errors?

     

  • Richard Soeteman 4045 posts 12898 karma points MVP 2x
    Mar 05, 2015 @ 08:10
    Richard Soeteman
    0

    For CMSImport V3 http://soetemansoftware.nl/cmsimport I had the same issue. What I did was first check if a property is changed and only then I save the item otherwise not.

    Maybe you can implement something similar?

    Best,

    Richard

  • David Sheiles 67 posts 337 karma points
    Mar 05, 2015 @ 11:34
    David Sheiles
    0

    Thanks Richard,

    Yes, I've started to go down that same path and it's working much better. Using IPublishedContent to query and check the properties and only load the IContent if there is a change. Seems to be working much better, but I can see that I might run into duplicates if a node is unpublished, which could be a problem as my script will run weeklyover alot of nodes. I'll just have to see how it goes.

    Cheers,

    Dave

  • Sören Deger 733 posts 2844 karma points c-trib
    Mar 05, 2015 @ 11:55
    Sören Deger
    0

    Hi David,

    I do not know if this could be a solution, but simply a spontaneous idea:

    You can use the ContentService.Unpublished event for writing the unpublished nodeIds in a custom table or a cache file. Conversely, of course, you can use the ContentService.Published event for remove the nodeIds from the custom table or cache file. So you can proceed as described with IPublishedContent. At the end of your weekly script you can read the unpublished nodes based of your custom table or cache file and use ContentService only for the unpublished nodes.

     

    Best,

    Sören

  • David Sheiles 67 posts 337 karma points
    Mar 05, 2015 @ 12:29
    David Sheiles
    0

    Thanks Sören,

    That may be a good idea. A spontaneous addition to that is storing all ID's (either published / unpublished) and store a hash of the combination of properties. That way I can do a quick lookup to see if it's changed.

    Not sure if creating the hash for each node will be too slow to be worth it. 

    If I run into problems I might give it a go. 

  • Keith Donnell 82 posts 187 karma points
    Mar 05, 2015 @ 16:12
    Keith Donnell
    0

    @David I didn't run into any memory issues on my end, but feel free to take a look at my forked version of the repo on Github - I was able to improve bulk insert performance about 6-fold.

  • David Sheiles 67 posts 337 karma points
    Mar 06, 2015 @ 03:08
    David Sheiles
    0

    Thanks Kieth... Sorry if this is a silly question (I'm a noob with github), how do I find your forked code? I'd be really interested in seeing what you've done.

  • Keith Donnell 82 posts 187 karma points
    Mar 06, 2015 @ 16:12
    Keith Donnell
    0

    @David I apologize.  My forked code was for performance updates to the Merchello plugin - I didn't dare touch the Umbraco base.  For some reason I thought this was the Merchello topic :/

  • David Sheiles 67 posts 337 karma points
    Oct 20, 2016 @ 12:54
    David Sheiles
    102

    Hey Everyone, I know this is an old topic now, but I stumbled across the thread and thought I'd share how my project is going after a year in production. Hope to help others out before going down the same path.

    1. Lesson Learnt #1 - Don't try to store a large set of data as content nodes especially if they are being updated frequently
    2. Lesson Learnt #2 - Frequent updates will blow out the database as the rollback feature will grow dramatically. Matt Brailsford's UnVersion plugin saved me here!
    3. To speed things up, I hook into the saved, deleted & trashed events and then update/delete a custom database table that holds the Content ID, A Unique identifier to match the external imported data, a calculated hash of all the editable fields, and a last modified date.
    4. When the weekly script runs to make the updates, I calculate a hash of the new data and compare it with the hash stored in the database
    5. If Changed, I go ahead and load the content node, save changes (which fires the saved event and updates the hash in the DB)

    This doesn't help with speed if you know that you need to update every node, but it does help when you have a smaller subset to update.

    My project is still running well, but the import is still slow when there are a lot of updates.

    So to sum up:
    Don't use content nodes for large amounts of data thats regularly updated. You are best to keep the data in a separate data table and manage displaying the data via custom routes.

  • Alex Skrypnyk 6148 posts 24097 karma points MVP 8x admin c-trib
    Oct 20, 2016 @ 16:02
    Alex Skrypnyk
    0

    Thanks David for sharing, great lessons.

Please Sign in or register to post replies

Write your reply to:

Draft