large media index takes ages to run

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Mayfly Media 12 posts 177 karma points

Aug 20, 2015 @ 09:58

0

Large media index takes ages to run

We have an Umbraco 6.2.5 site within which we are storing approximately 200,000 media items in the Umbraco media library. The images we are storing are all between about 500KB and 1MB in size.

We use the CogUmbracoExamineMediaIndexer to index our media. When media items are uploaded in bulk (20 at a time) the media indexer kicks off but the site memory usage begins to gradually increase and the CPU maxes out. The process appears to run for many hours before dying down.

Often the client performs multiple bulk uploads in a day resulting in the indexer running almost continuously.

A few questions; any ideas why the indexer could be running for so long for only 20 images? Is the high memory and CPU usage a known issue with the indexer?

Copy Link
Shannon Deminick 1530 posts 5278 karma points MVP 3x

Aug 20, 2015 @ 10:05

0

Firstly the performance problem is the data lookups. The actual indexing process is CPU intensive but it is fast.

I don't know what CogUmbracoExamineMediaIndexer does, i would suspect the problem is part of that. Perhaps for each item it's also doing some queries, or other operations (i.e. if it's analyzing each image that would be really really terrible for performance) and you'll have N+1. My advise would be to start looking there to see what is happening.

Also, what version of Umbraco are you using as this can greatly change the performance. Older versions of Umbraco don't lookup data in a very efficient manner.

Copy Link
Shannon Deminick 1530 posts 5278 karma points MVP 3x

Aug 20, 2015 @ 10:06

0

Also, i hope you are not rebuilding this index? Adding to the index 20 at a time should be fine... rebuilding it would be quite costly but it should still work unless this CogUmbracoExamineMediaIndexer is doing something it shouldn't under the hood.

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Aug 20, 2015 @ 10:07

0

The cogmediaindexer uses tika that is Java and ikvm wrapper around that so that will add up with regards to performance when you have lots of media.

Regards

Ismail

Copy Link
Shannon Deminick 1530 posts 5278 karma points MVP 3x

Aug 20, 2015 @ 12:23

0

yikes! I'd suggest that is probably most of the issue here. You'll be spinning up a Java VM for this which has got to be pretty processor heavy, then I assume Tika is going to try to open up all your files to read them, this will probably occupy a lot of memory and CPU.

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Aug 20, 2015 @ 15:49

0

A few other people have had performance issues when indexing to the order of thousands, I only tested with 10 20 documents.

Are you looking to build some kind of front end searchable image library? Are you using cogmediaindexer to get exif data out of images? If not then do you need cogmediaindexer?

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Large media index takes ages to run