Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Mark 255 posts 612 karma points
    Sep 15, 2015 @ 09:19
    Mark
    0

    Examine Index: Media items not being removed

    Umbraco 6.1.5

    Under "Examine Management" in the Developer section, we have a media section indexer that had the following:

    Has deletions? / Optimized? = true (21)/ false

    We had a document that was showing up in search results (querying the lucene index with StandardAnalyzer, indexing also by StandardAnalyzer). This document had been removed from the Media section (by deleting it and then emptying the recycle bin).

    We rebuilt the index, and then the setting changed to:

    Has deletions? / Optimized? = false (0)/ true

    Then the document no longer showed up in search results, i.e. it was removed from the index.

    I assume the document wasn't one of the items marked as deleted, as Lucene wouldn't return those documents from a search. But before rebuilding the index, the indexer had the following setting:

    Documents in index = 3011

    But afterwards it had:

    Documents in index = 2461

    This is obviously far more than the 21 documents marked as deleted.

    I'm not clear how Umbraco or Lucene handles the removal of documents that are marked for deletion, nor how it handles optimization, perhaps someone can tell me. But I would like to know why the index had over 500 documents hanging around in the index that were not marked for deletion. Any ideas?

    The umbracoLog table has no error entries attributed to media item deletions. Does the GatheringNodeData event automatically log errors to the umbracoLog table if an error occurs?

    Is there a way to automate the rebuilding of an index?

  • Mark 255 posts 612 karma points
    Sep 16, 2015 @ 08:23
    Mark
    0

    We have just discovered a similar issue with the InternalIndexer. There were files that were deleted from the media section (and recycle bin emptied), which still existed in the lucene index, so it would seem it's a general problem. Is there a solution?

    I'm planning on writing a scheduled task to periodically rebuild the index. But it would be good to know if there's an alternative. I would expect the lucene indexes to match the media section items.

  • Mark 255 posts 612 karma points
    Sep 17, 2015 @ 12:07
    Mark
    0

    Looks like I'll have to post on Stack Overflow...

Please Sign in or register to post replies

Write your reply to:

Draft