Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Simon Dingley 1395 posts 3249 karma points c-trib
    Dec 15, 2016 @ 17:34
    Simon Dingley
    0

    Index Build Times - Measuring and improving

    I have a large site that takes anything up to an hour to rebuild indexes, during this time the site is unusable, this is obviously not good. Thankfully in the live environment, we have the site load balanced so we can swap them independently out of the load balancer to rebuild if/when necessary without completely taking the site down but it's still a massive pain point.

    However, locally this is more of an issue. I want to see if there are events I can hook into to detect the start and end of the indexing process to get myself some metrics before I start trying to optimise the process?

    The new Common Pitfalls & Anti-Patterns document in the Documentation section was a real eye opener, in particular, the section titled Performing lookups and logic in Examine events as it might be an indicator as to the bottleneck in this process on the site in question.

    If anyone has any pointers on tracking before and after timings for index rebuilds that would be most helpful.

    Cheers, Simon

  • Darren Ferguson 1019 posts 3255 karma points MVP c-trib
    Dec 16, 2016 @ 09:34
    Darren Ferguson
    0

    Hi Simon,

    I'd extend this class:

    https://github.com/Shazwazza/Examine/blob/master/Projects/UmbracoExamine/UmbracoContentIndexer.cs

    If you provide overrides of the AddSingleNodeToIndex, RebuildIndex nodes and ReIndexNode methods - just calling the base methods and wrapping them in before/after logging methods it is quite easy to collect some detailed information.

    Other advice: Just have the InternalIndexer, remove the External and do your filtering in the search.

    I also wrote an indexer for building the Internal Index which is about 4x faster than the regular indexer - which I can put code for somewhere.

    You can also disable indexing at boot completely and do it via the Umbraco backoffice which seems quicker.

    I guess the best route is to get the Azure blob storage for Examine performing well. There is a thread around that here: https://our.umbraco.org/forum/extending-umbraco-and-using-the-api/78818-using-azuredirectory-with-examine#comment-258803

    We have also submitted a PR to Umbraco to implement ISearchable tree - http://issues.umbraco.org/issue/U4-2676 this allows you to use something other than Examine to search the backoffice, such as Azure search - I've prototyped indexing Umbraco content into Azure Search here: https://github.com/darrenferguson/UmbracoAzureSearch

  • Simon Dingley 1395 posts 3249 karma points c-trib
    Dec 16, 2016 @ 10:28
    Simon Dingley
    0

    Hi Darren,

    I'd extend this class:

    https://github.com/Shazwazza/Examine/blob/master/Projects/UmbracoExamine/UmbracoContentIndexer.cs

    If you provide overrides of the AddSingleNodeToIndex, RebuildIndex nodes and ReIndexNode methods - just calling the base methods and wrapping them in before/after logging methods it is quite easy to collect some detailed information.

    Thanks for the advice I will take a look at the class and see if I can get something implemented to get some useful data as a starting point.

    With regards to your next bit of advice, do you mean just have a single index for the site, the internal one and use it for everything with checks on the front end for permissions etc? Another potential issue is that we currently have a couple of additional custom indexes in the site for legacy reasons, this project has been around since v4 and there are a few things that are a case of "if I knew then what I know now, I wouldn't actually do it like that". I want to go back and refactor that stuff and so reducing the number of indexes will be a part of that work. Your suggestion of one index is appealing!

    I also wrote an indexer for building the Internal Index which is about 4x faster than the regular indexer - which I can put code for somewhere.

    I'm very interested in looking at that if you are willing to share it?

    You can also disable indexing at boot completely and do it via the Umbraco backoffice which seems quicker.

    You can also disable indexing at boot completely and do it via the Umbraco backoffice which seems quicker.

    By adding RebuildOnAppStart="false" to the <examine> node in ExamineSettings.config?

    I guess the best route is to get the Azure blob storage for Examine performing well.

    The site is not hosted on Azure so I don't think it will help in this case?

    Cheers, Simon

  • Darren Ferguson 1019 posts 3255 karma points MVP c-trib
    Dec 19, 2016 @ 09:17
    Darren Ferguson
    0

    By adding RebuildOnAppStart="false" to the

    Yes :)

    The site is not hosted on Azure so I don't think it will help in this case?

    Probably not just now - but you could use an externally based index to prevent startup build times (once this works)...

    I'm preparing my indexer code for github just now...

Please Sign in or register to post replies

Write your reply to:

Draft