Loading large number of images into media library very slow
Hi,
I have a site with 140,000 hi-res images covering 20 years of business. I have been loading them into Umbraco using MediaService.CreateMedia() but it is an extremely slow process.
Part of the reason might be that the database already has a lot of nodes (50K). Another reason is that I have a MediaService.Saving event handler that moves the image. I do this so I can organize the images in the media folder so that they are not all in the same folder. But, at this rate it will take me weeks of non-stop processing to get the images loaded.
Does anyone know a faster way to load them? It's only a one-time load, so maybe some appropriate direct-to-database SQL code would help?
Basically, try disabling Examine, as I suspect it is Examine indexing that is slowing things down.
However, if you do that, you probably can't use the save event anymore to do the moving of your media (Examine is used to fetch media info). You'd want to put them in the right place right away (when you create media, you can specify the parent folder).
Once you are done with the import, you can enable and rebuild the Examine indexes.
I haven't tried this, but if it works let us know :-)
I will try disabling Examine to see it if helps, but my understanding is that Examine indexing happens in the background anyway, right?
As for specifying the parent folder, I already do that. The organization of the images in the Media Library UI is taken care of. Lots of folders within folders, so that it's easy to find stuff and I don't have any folders with too many images.
I am performing several steps that contribute to the slowness. Some of it comes from just finding (or, if necessary creating) the appropriate IMedia node that will become the parent of the image node in the Media Library. Another part is the creation of the image and thumbnail files themselves. Finally, I am publishing content nodes to which I am attaching these images once they have been imported.
Since I don't really care about indexes (which i can rebuild after) and updating web servers caches during the import, I will try disabling both of those. I can also prevent some of the publish related events from propagating which will speed things up. I will try that and see what happens.
But the original reason for the post was that it seems to me that it would be a lot easier if I just knew the SQL needed to insert the images. That way there is less copying of images and fewer updates to the database, the cache, and the files themselves. I could determine the SQL with some tracing, but I wondered if anyone had other ideas before I got my hands soooo dirty.
Loading large number of images into media library very slow
Hi,
I have a site with 140,000 hi-res images covering 20 years of business. I have been loading them into Umbraco using MediaService.CreateMedia() but it is an extremely slow process.
Part of the reason might be that the database already has a lot of nodes (50K). Another reason is that I have a MediaService.Saving event handler that moves the image. I do this so I can organize the images in the media folder so that they are not all in the same folder. But, at this rate it will take me weeks of non-stop processing to get the images loaded.
Does anyone know a faster way to load them? It's only a one-time load, so maybe some appropriate direct-to-database SQL code would help?
Any advice would be appreciated.
Thanks, Paul.
See here: https://our.umbraco.org/forum/umbraco-7/using-umbraco-7/72898-faster-way-to-publish-a-large-number-of-nodes#comment-233951
Basically, try disabling Examine, as I suspect it is Examine indexing that is slowing things down.
However, if you do that, you probably can't use the save event anymore to do the moving of your media (Examine is used to fetch media info). You'd want to put them in the right place right away (when you create media, you can specify the parent folder).
Once you are done with the import, you can enable and rebuild the Examine indexes.
I haven't tried this, but if it works let us know :-)
Thanks for your input Nicolas.
I will try disabling Examine to see it if helps, but my understanding is that Examine indexing happens in the background anyway, right?
As for specifying the parent folder, I already do that. The organization of the images in the Media Library UI is taken care of. Lots of folders within folders, so that it's easy to find stuff and I don't have any folders with too many images.
The "moving" I am referring to, is the location of the files on disk. Normally, they are stored in /media/nnnn/image.jpg. I want them in a much deeper structure that matches the structure in the back office Media Library. For this I followed the advice given here: https://our.umbraco.org/forum/umbraco-7/developing-umbraco-7-packages/60534-Physical-structure-of-media-folder.
I am performing several steps that contribute to the slowness. Some of it comes from just finding (or, if necessary creating) the appropriate IMedia node that will become the parent of the image node in the Media Library. Another part is the creation of the image and thumbnail files themselves. Finally, I am publishing content nodes to which I am attaching these images once they have been imported.
Since I don't really care about indexes (which i can rebuild after) and updating web servers caches during the import, I will try disabling both of those. I can also prevent some of the publish related events from propagating which will speed things up. I will try that and see what happens.
But the original reason for the post was that it seems to me that it would be a lot easier if I just knew the SQL needed to insert the images. That way there is less copying of images and fewer updates to the database, the cache, and the files themselves. I could determine the SQL with some tracing, but I wondered if anyone had other ideas before I got my hands soooo dirty.
Thanks again, Paul.
If I understand it correctly, setting "enableDefaultEventHandler" to false will prevent the "background" update of the Examine indexes.
is working on a reply...