importing content is taking very long

Scott Babcock 12 posts 92 karma points

Jan 22, 2019 @ 20:55

Importing content is taking very long

Importing content is taking about about 12 to 13 minutes per piece of content. The content itself doesn't have anything complicated, just text and numbers. We started using a CSV file that had about 15k records but then paired it down to 1000, then 100, and finally 20.

I'm not sure what is going on and I'm just wondering if there is any help/advice you can give.

Copy Link

Richard Soeteman 4054 posts 12927 karma points MVP 2x

Jan 23, 2019 @ 07:14

Hi Scott,

I think it is related to pickers since those will do a lookup over content nodes. Make the datatype configuration the best possible so it will only return a small set. (prefer node id's over xpath) and it should be faster already.

Also 15K records is kind of huge I would suggest to do batches of 1K records.

Hope this makes it a bit faster. Thing is that I am using the Umbraco API which is also not the fastest but better safe than sorry in this case.

Best,

Richard

Copy Link

Scott Babcock 12 posts 92 karma points

Jan 24, 2019 @ 16:04

I think you might be right on where the problem is. I added some logging and found that the delay comes between Umbraco's "saved to xml" message and the importers RecordImported event.

2019-01-24 09:24:50,820 [P3092/D2/T30] DEBUG Goldmark.Umbraco.CmsImportEventHandler - Units importing record 37236
2019-01-24 09:24:54,650 [P3092/D2/T34] INFO  umbraco.content - Save Xml to file...
2019-01-24 09:24:55,818 [P3092/D2/T34] INFO  umbraco.content - Saved Xml to file.
2019-01-24 09:44:13,490 [P3092/D2/T30] DEBUG Goldmark.Umbraco.CmsImportEventHandler - Units imported record 37236 with action Overwrite
2019-01-24 09:44:13,507 [P3092/D2/T30] INFO  Umbraco.Core.Publishing.PublishingStrategy - Content '104' with Id '34042' has been published.
2019-01-24 09:44:14,325 [P3092/D2/T30] INFO  Umbraco.Core.Services.ContentService - Call was made to ContentService.Publish, use PublishWithStatus instead since that method will provide more detailed information on the outcome

Are you still thinking it is related to pickers and having to do lookups over content nodes?

Copy Link

Richard Soeteman 4054 posts 12927 karma points MVP 2x

Jan 24, 2019 @ 16:06

Hi Scott,

Good find. I assume you have set autopublish to true? Might be worth setting that to false and publish the rootnode manually. It's indeed taking a long time to update the Umbraco xml file which is huge I think?

Best,

Richard

Copy Link

Scott Babcock 12 posts 92 karma points

Jan 25, 2019 @ 22:20

Good news! Through trial and error I was able to bring the import down to minutes or event seconds depending on how many (500 took 20 seconds).

I was able to do this by removing an advanced setting on the import. We have a multinode tree picker that points to content in another root folder. This look up was what was eating up all the time.

While digging through the logs I found something interesting (the property it is referring to is our multinode tree picker):

2019-01-25 15:54:17,861 [P4300/D10/T47] ERROR CMSImportLibrary.AdvancedSettingControls.Controls.MNTP.MNTP2Control - CMSImport:Could not determine media for property amenities ON unit
System.NullReferenceException: Object reference not set to an instance of an object.
   at Umbraco.Core.Persistence.Database.<Query>d__74`1.MoveNext()
   at CMSImportLibrary.Helpers.DataHelper.GetPreValueByDataTypeDefinitionId(Int32 , Int32 )
   at CMSImportLibrary.TypeExtensions.PropertyExtensions.GetPreValue(PropertyInfo pt, Int32 index)
   at CMSImportLibrary.AdvancedSettingControls.Controls.MNTP.MNTP2Control.IsMedia(String documentAlias, String propertyAlias)

The thing I thought was interesting is that it says it can't determine the "media". The property is pointing to content not media. Could this a bug in the code? Something I caused? Just wanted to get you thoughts. Thanks.

Copy Link

Richard Soeteman 4054 posts 12927 karma points MVP 2x

Jan 28, 2019 @ 07:39

Hi Scott,

Great it's working and The advanced setting is indeed only for media. For Content this is done automatically. Did you select the advanced setting? I have fixed this error in the latest CMSImport release so curious why it is still logging this.

Best,

Richard

Copy Link

Scott Babcock 12 posts 92 karma points

Jan 28, 2019 @ 15:04

I don't have any advanced settings. Since it is a Content multinode tree picker everything is done automatically.

Is using Media for a multinode tree picker more performant that Content?

As a note I am using CMSImport 3.7.6.

Copy Link

Richard Soeteman 4054 posts 12927 karma points MVP 2x

Jan 28, 2019 @ 16:52

Hi Scott,

Yes Media is much faster since that just imports the files as media and sets the udi's No need for lookups in that case.

Best,

Richard

Copy Link

Trevor Evans 2 posts 72 karma points

Jun 20, 2019 @ 00:15

First of all - Richard, awesome package. Has saved me countless hours. Well worth the price for the Pro edition.

Jumping in here since my issue is very similar to the one Scott reported. I have multiple import definitions setup in the 1,000+ records range. It seems like the more records that currently exist for the document type I am importing for, the longer it takes. I also seem to have more issues with child definitions, particularly when the "parent" has 1000+ records.

The current import I am running has ~1800 "personnel" records that are the children of ~2000 projects. The Personnel document type has a Multinode Treepicker where you can select from ~900 Contact records that have already been imported. Looks like a similar setup to what Scott had, though I don't have any "advanced setting" enabled that I know of. I'm also not seeing any errors in the log file.

When I run the import, the first few records are always slow ... like up to 30 mins per record. Then it's like the floodgates open and it starts importing records every second or 2 and run that way for quite a while, then suddenly grind back down to a halt and go back to 30 mins a record again. Currently "hung" on record 1528 of 1774.

Like Scott, I also see in the log file where there is a "saved to xml" message in between these long delays. It's odd that it does this for some random records and not for all. I'm not sure what else to try to speed things up. In this case, breaking it up into smaller chunks doesn't seem to help - the number of records being imported doesn't seem to matter as much as the number of records that already exist.

For what it's worth, my datasource is SQL Server and my query runs in less than a second. I'm going to have to run all my imports at least 1 more time before we take this live, so I'd appreciate any ideas you might have for working around this issue.

Copy Link

Richard Soeteman 4054 posts 12927 karma points MVP 2x

Jun 21, 2019 @ 07:05

Hi Trevor,

Thanks for the kind words. The advanced settings are not there for content nodes but it checks the Datatype settings of the picker. Best to make that a node picker instead of xpath and narrow it as small as possible. If save xml is taking a long time then disabling the autopublish might be an option and publish the parent including children after import manually will be faster.

Hope this helps,

Richard

Copy Link

is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Flag this post as spam?

Importing content is taking very long