I have a large site that takes anything up to an hour to rebuild indexes, during this time the site is unusable, this is obviously not good. Thankfully in the live environment, we have the site load balanced so we can swap them independently out of the load balancer to rebuild if/when necessary without completely taking the site down but it's still a massive pain point.
However, locally this is more of an issue. I want to see if there are events I can hook into to detect the start and end of the indexing process to get myself some metrics before I start trying to optimise the process?
If you provide overrides of the AddSingleNodeToIndex, RebuildIndex nodes and ReIndexNode methods - just calling the base methods and wrapping them in before/after logging methods it is quite easy to collect some detailed information.
Other advice: Just have the InternalIndexer, remove the External and do your filtering in the search.
I also wrote an indexer for building the Internal Index which is about 4x faster than the regular indexer - which I can put code for somewhere.
You can also disable indexing at boot completely and do it via the Umbraco backoffice which seems quicker.
If you provide overrides of the AddSingleNodeToIndex, RebuildIndex
nodes and ReIndexNode methods - just calling the base methods and
wrapping them in before/after logging methods it is quite easy to
collect some detailed information.
Thanks for the advice I will take a look at the class and see if I can get something implemented to get some useful data as a starting point.
With regards to your next bit of advice, do you mean just have a single index for the site, the internal one and use it for everything with checks on the front end for permissions etc? Another potential issue is that we currently have a couple of additional custom indexes in the site for legacy reasons, this project has been around since v4 and there are a few things that are a case of "if I knew then what I know now, I wouldn't actually do it like that". I want to go back and refactor that stuff and so reducing the number of indexes will be a part of that work. Your suggestion of one index is appealing!
I also wrote an indexer for building the Internal Index which is about 4x faster than the regular indexer - which I can put code for somewhere.
I'm very interested in looking at that if you are willing to share it?
You can also disable indexing at boot completely and do it via the Umbraco backoffice which seems quicker.
You can also disable indexing at boot completely and do it via the Umbraco backoffice which seems quicker.
By adding RebuildOnAppStart="false" to the <examine> node in ExamineSettings.config?
I guess the best route is to get the Azure blob storage for Examine performing well.
The site is not hosted on Azure so I don't think it will help in this case?
Index Build Times - Measuring and improving
I have a large site that takes anything up to an hour to rebuild indexes, during this time the site is unusable, this is obviously not good. Thankfully in the live environment, we have the site load balanced so we can swap them independently out of the load balancer to rebuild if/when necessary without completely taking the site down but it's still a massive pain point.
However, locally this is more of an issue. I want to see if there are events I can hook into to detect the start and end of the indexing process to get myself some metrics before I start trying to optimise the process?
The new Common Pitfalls & Anti-Patterns document in the Documentation section was a real eye opener, in particular, the section titled Performing lookups and logic in Examine events as it might be an indicator as to the bottleneck in this process on the site in question.
If anyone has any pointers on tracking before and after timings for index rebuilds that would be most helpful.
Cheers, Simon
Hi Simon,
I'd extend this class:
https://github.com/Shazwazza/Examine/blob/master/Projects/UmbracoExamine/UmbracoContentIndexer.cs
If you provide overrides of the AddSingleNodeToIndex, RebuildIndex nodes and ReIndexNode methods - just calling the base methods and wrapping them in before/after logging methods it is quite easy to collect some detailed information.
Other advice: Just have the InternalIndexer, remove the External and do your filtering in the search.
I also wrote an indexer for building the Internal Index which is about 4x faster than the regular indexer - which I can put code for somewhere.
You can also disable indexing at boot completely and do it via the Umbraco backoffice which seems quicker.
I guess the best route is to get the Azure blob storage for Examine performing well. There is a thread around that here: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/78818-using-azuredirectory-with-examine#comment-258803
We have also submitted a PR to Umbraco to implement ISearchable tree - http://issues.umbraco.org/issue/U4-2676 this allows you to use something other than Examine to search the backoffice, such as Azure search - I've prototyped indexing Umbraco content into Azure Search here: https://github.com/darrenferguson/UmbracoAzureSearch
Hi Darren,
Thanks for the advice I will take a look at the class and see if I can get something implemented to get some useful data as a starting point.
With regards to your next bit of advice, do you mean just have a single index for the site, the internal one and use it for everything with checks on the front end for permissions etc? Another potential issue is that we currently have a couple of additional custom indexes in the site for legacy reasons, this project has been around since v4 and there are a few things that are a case of "if I knew then what I know now, I wouldn't actually do it like that". I want to go back and refactor that stuff and so reducing the number of indexes will be a part of that work. Your suggestion of one index is appealing!
I'm very interested in looking at that if you are willing to share it?
By adding
RebuildOnAppStart="false"
to the<examine>
node in ExamineSettings.config?The site is not hosted on Azure so I don't think it will help in this case?
Cheers, Simon
Yes :)
Probably not just now - but you could use an externally based index to prevent startup build times (once this works)...
I'm preparing my indexer code for github just now...
is working on a reply...