Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • jaygreasley 416 posts 403 karma points
    May 09, 2011 @ 12:08
    jaygreasley
    0

    Performance

    I just wondered if anyone has any performance metrics they would like to share with regard to the time taken to import content using CMSImport/

    I know the source data and the hardware will affect this but I am trying to get an idea for the time to import ~10000 records, each with about 20-30 fields.

    Would you consider testing the import with the free version (500 records) and then multiply it up to be sufficient or does performance degrade as the quantity increases?

    tia

    Jay

  • Richard Soeteman 4035 posts 12842 karma points MVP
    May 09, 2011 @ 12:24
    Richard Soeteman
    1

    Hi Jay,

    The performance will degrade when the quantity increases. Importing 10.000 records has been done before, but it will take some time (aprox an hour, if you import to a single folder it might even be longer) and you need to tweak some config settings. These are documented in the manual http://www.cmsimport.com/documentation.aspx .

    I have thought about writing my own DataLayer in the past but when Umbraco changes something to thier DB schema CMSImport can cause starnge issues.

    The current version is compatible with Umbraco V4.0 also but the next version will be 4.5+ compatible only. That means I can use the optimized mode of a document and I must say that's much faster. Don't have exact numbers yet.

    Cheers,

    Richard

  • Jonathan Lathigee 56 posts 99 karma points
    May 11, 2011 @ 05:26
    Jonathan Lathigee
    0

    Hi Richard

    Can you please point to where in the manual there is mention of performance tuning for imports? I've read through and can't find anything.

    I am importing 3000 nodes, but (and I think this is key) with *related media*. I've had to split them down to 250 record batches because of the time involved, and they're still taking 20 minutes-ish each. For a one-time import, this is painful but doable; for automated future updates this will be problematic. Is there anything I can do on my end to extend timeouts and (preferably) speed up imports.

    Thanks

    Jonathan

  • Richard Soeteman 4035 posts 12842 karma points MVP
    May 11, 2011 @ 08:17
    Richard Soeteman
    0

    Hi,

    The good news is it will not timeout, since I increase the timeout during the start of the import. Also updates on a given node are imported faster than the initial import. Also for media, if the item already exists it will use the already existing reference.

    There are a few tricks you can do.Thought I've added them to the manual. will do for 2.0

    First is to disable cache updates for every action.You can do this by setting ContinouslyUpdateXmlDiskCache to false in UmbracoSettings.config

     <!-- Update disk cache every time content has changed -->
    <ContinouslyUpdateXmlDiskCache>False</ContinouslyUpdateXmlDiskCache>

    It's also good to not have many nodes underneath one node. If all 3000 nodes are stored underneath one rootnode it might come handy to use a DateFolder or Alphabetfolder package to auto structure the nodes.

    If you have set autopublish to true you can also set XmlCacheEnabled to false. This will prevent writing the xml cache file over and over again for every publish. By disabling this you will have a slower (not much) startup experience of the site since the xml file will normally be used during startup to load the nodes.

    <!-- Enable / disable xml content cache on disk, only needed for faster startup time-->
        <XmlCacheEnabled>False</XmlCacheEnabled>

    If you have set publish to true, it's also best to use the latest Umbraco version since this handles locking of the lucene indexes much better and no weird exceptions are thrown.

    This is basically it. In 2.0 I will drop support for Umbraco 4.0. Then I can use the optimized mode on the API which will imporove performance as well. Still the API is a bit slow. This will be addressed in version 5 of Umbraco.

    Please let me know if you have any additional questions.

    Cheers,

    Richard

     

  • Jonathan Lathigee 56 posts 99 karma points
    May 11, 2011 @ 17:53
    Jonathan Lathigee
    0

    Hi Richard

    Thanks for all these tips - I'll give them a try - and sorry, Jay, for hijacking your thread (seemed to continue the discussion, though).

    A question of clarification:

    If I set XmlCacheEnabledto false, the XML Cache is not updated when each node is published. Fine. So what are the implications? My understanding is that the Lucene indexer gets triggered on node publish (correct me if I'm wrong). Will the indexer still be called (ie does it index against the nde on publish, or does it use the XML cache)?

    And the XML cache is required to actually serve the site, right? So does the cache get created in its entirety on app start (if it hasn't been updated after each publish)? Or node by node when pages are requested by the client?

    I don't have a very strong understanding of what the XML cache does / how it's used by the system.

    Thanks again

    Jonathan

  • Richard Soeteman 4035 posts 12842 karma points MVP
    May 11, 2011 @ 20:11
    Richard Soeteman
    0

    Hi Jonathan,

    No worries. The xmlCacheEnabled is only for  a fast startup. The internal cache is kept in memory. If you publish this internal cache will still be updated. And during app_start it will build the cache based on the database instead of the xm file. This is a bit slower but that is only once and in your situation it will improve your import process.

    Publish will still trigger the lucene indexer, it only doesn't write the whole cache to disk.

    Cheers,

    Richard

  • Mayank 14 posts 44 karma points
    Jan 22, 2016 @ 07:06
    Mayank
    0

    Hi Richard,

    I am facing issue while importing data from Excel CmsImport licenced.

    On load balance environment Traditional (umbraco v 7.1.8).

    The upload does not get completed.

    Half of the nodes are created and published , some are created but not published and others are missing.

    I need to know what will be effect of these optimizing setting in the load balanced environment.

    I also need help on how to update to newer version of plugin.

    Regards, Mayank Parekh.

Please Sign in or register to post replies

Write your reply to:

Draft