Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Luc van Soest 13 posts 148 karma points
    Mar 13, 2017 @ 15:42
    Luc van Soest
    0

    Problem with Examine indexes with 15.000+ media items (missing items)

    For a customer I developed a website (currently running v7.5.10) with a lot of media-items (15.000+) which have to be searchable so I configured a custom examine-index which is working fine with a few hundred media items on my local development machine. When I updated the examineindex.config on the production environment, I saw that the directory and files for the newly configured Index were generated just fine.

    And now my issue, after some testing and checking I found out that only about +/- 2.500 items where indexed of the total 15.000 items.

    I tried rebuilding the index via Examine Management in the Developer section without any success. Then I tried stopping the application, emptying the index directory and starting the application again, but still the largest part of the items were missing from the newly generated index.

    I checked the umbraco-log and there are no errors to be seen.

    If I open one of the media-items that is missing from the index and save it, it gets nicely added to my index without any issues. With this in mind I tried to make a script, using the MediaService, that loops trough all 15.000+ items and saves them, but Umbraco's MediaService seemes to use very inefficient sql-queries which cause the customer's SQL Server to break down so this is not an option.

    I assume my problem has something to do with the large number of media-items which causes problems with the initial build of the index.

    I already spent a massive amount of time to search for a solution and I'm gettting pretty desperate by now.

    Any help will be highly appreciated!

  • Nicholas Westby 2054 posts 7103 karma points c-trib
    Mar 13, 2017 @ 18:39
    Nicholas Westby
    1

    See the comments in this ticket: http://issues.umbraco.org/issue/U4-9598

    Apparently, Examine can sometimes have issues if you have more than 10,000 items to index:

    I figured it out. The problem was in the UmbracoExamine.UmbracoContentIndexer 's ReindexWithXmlEntries method. The indexing is done in pages with a max page size of 10,000 nodes. The do-while loop in the ReindexWithXmlEntries() first calls its getPagedXmlEntries function to go out and fetch the current page. One of the things that this getPagedXmlEntries function made sure to do was filter out any results that were the children of unpublished parent nodes. Unfortunately, the do-while loop in the ReindexWithXmlEntries() method would only move on to the next page if the number of filtered results from getPagedXmlEntries was the same as the page size. For sites with more than 10,000 nodes where some of the nodes are unpublished, deeply nested nodes get unindexed.

    Maybe that relates to your situation since you have 15,000+ items.

Please Sign in or register to post replies

Write your reply to:

Draft