We have an Umbraco 6.2.5 site within which we are storing approximately 200,000 media items in the Umbraco media library. The images we are storing are all between about 500KB and 1MB in size.
We use the CogUmbracoExamineMediaIndexer to index our media. When media items are uploaded in bulk (20 at a time) the media indexer kicks off but the site memory usage begins to gradually increase and the CPU maxes out. The process appears to run for many hours before dying down.
Often the client performs multiple bulk uploads in a day resulting in the indexer running almost continuously.
A few questions; any ideas why the indexer could be running for so long for only 20 images? Is the high memory and CPU usage a known issue with the indexer?
Firstly the performance problem is the data lookups. The actual indexing process is CPU intensive but it is fast.
I don't know what CogUmbracoExamineMediaIndexer does, i would suspect the problem is part of that. Perhaps for each item it's also doing some queries, or other operations (i.e. if it's analyzing each image that would be really really terrible for performance) and you'll have N+1. My advise would be to start looking there to see what is happening.
Also, what version of Umbraco are you using as this can greatly change the performance. Older versions of Umbraco don't lookup data in a very efficient manner.
Also, i hope you are not rebuilding this index? Adding to the index 20 at a time should be fine... rebuilding it would be quite costly but it should still work unless this CogUmbracoExamineMediaIndexer is doing something it shouldn't under the hood.
yikes! I'd suggest that is probably most of the issue here. You'll be spinning up a Java VM for this which has got to be pretty processor heavy, then I assume Tika is going to try to open up all your files to read them, this will probably occupy a lot of memory and CPU.
A few other people have had performance issues when indexing to the order of thousands, I only tested with 10 20 documents.
Are you looking to build some kind of front end searchable image library? Are you using cogmediaindexer to get exif data out of images? If not then do you need cogmediaindexer?
Large media index takes ages to run
We have an Umbraco 6.2.5 site within which we are storing approximately 200,000 media items in the Umbraco media library. The images we are storing are all between about 500KB and 1MB in size.
We use the CogUmbracoExamineMediaIndexer to index our media. When media items are uploaded in bulk (20 at a time) the media indexer kicks off but the site memory usage begins to gradually increase and the CPU maxes out. The process appears to run for many hours before dying down.
Often the client performs multiple bulk uploads in a day resulting in the indexer running almost continuously.
A few questions; any ideas why the indexer could be running for so long for only 20 images? Is the high memory and CPU usage a known issue with the indexer?
Firstly the performance problem is the data lookups. The actual indexing process is CPU intensive but it is fast.
I don't know what
CogUmbracoExamineMediaIndexer
does, i would suspect the problem is part of that. Perhaps for each item it's also doing some queries, or other operations (i.e. if it's analyzing each image that would be really really terrible for performance) and you'll have N+1. My advise would be to start looking there to see what is happening.Also, what version of Umbraco are you using as this can greatly change the performance. Older versions of Umbraco don't lookup data in a very efficient manner.
Also, i hope you are not rebuilding this index? Adding to the index 20 at a time should be fine... rebuilding it would be quite costly but it should still work unless this
CogUmbracoExamineMediaIndexer
is doing something it shouldn't under the hood.The cogmediaindexer uses tika that is Java and ikvm wrapper around that so that will add up with regards to performance when you have lots of media.
Regards
Ismail
yikes! I'd suggest that is probably most of the issue here. You'll be spinning up a Java VM for this which has got to be pretty processor heavy, then I assume Tika is going to try to open up all your files to read them, this will probably occupy a lot of memory and CPU.
A few other people have had performance issues when indexing to the order of thousands, I only tested with 10 20 documents.
Are you looking to build some kind of front end searchable image library? Are you using cogmediaindexer to get exif data out of images? If not then do you need cogmediaindexer?
is working on a reply...