Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Chris Mahoney 235 posts 447 karma points
    Jan 16, 2023 @ 21:15
    Chris Mahoney
    0

    Some pages not going into Examine index

    Hi everyone,

    I've got a bit of a head-scratcher here. I'm running Umbraco 10.4, although I was having the same issue in 10.3.2.

    I have a document type called hazardPage, but some pages simply aren't appearing in the Examine/Lucene index. If I go to Examine Management and search for __NodeTypeAlias:hazardPage then some pages show up but others don't.

    It doesn't matter whether I search the internal or external index. If I use the "global" search in the top-right of Umbraco and type the name of one of the missing pages, it doesn't show up.

    Rebuilding the index hasn't helped, and resaving/publishing the problematic pages hasn't helped. I'm not sure what else to try. Any ideas?

    Edit: I shut down the site, deleted the index, and let it rebuild. Then I noticed this entry in the log:

    System.ArgumentException: Document contains at least one immense term in field="__Icon" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22]...'
       at Lucene.Net.Index.DocInverterPerField.ProcessFields(IIndexableField[] fields, Int32 count)
       at Lucene.Net.Index.DocFieldProcessor.ProcessDocument(Builder fieldInfos)
       at Lucene.Net.Index.DocumentsWriterPerThread.UpdateDocument(IEnumerable`1 doc, Analyzer analyzer, Term delTerm)
       at Lucene.Net.Index.DocumentsWriter.UpdateDocument(IEnumerable`1 doc, Analyzer analyzer, Term delTerm)
       at Lucene.Net.Index.IndexWriter.UpdateDocument(Term term, IEnumerable`1 doc, Analyzer analyzer)
       at Lucene.Net.Index.IndexWriter.UpdateDocument(Term term, IEnumerable`1 doc)
       at Lucene.Net.Index.TrackingIndexWriter.UpdateDocument(Term t, IEnumerable`1 d)
       at Examine.Lucene.Providers.LuceneIndex.AddDocument(Document doc, ValueSet valueSet)
       at Examine.Lucene.Providers.LuceneIndex.ProcessIndexQueueItem(IndexOperation op)
       at Examine.Lucene.Providers.LuceneIndex.ProcessQueueItem(IndexOperation item)
       at Examine.Lucene.Providers.LuceneIndex.PerformIndexItemsInternal(IEnumerable`1 valueSets, CancellationToken cancellationToken)
    

    I'm guessing __Icon is a system field (with the double underscores) so is there some way I can figure out which page is causing the issue?

  • Chris Mahoney 235 posts 447 karma points
    Jan 16, 2023 @ 23:52
    Chris Mahoney
    1

    I've figured it out.

    There's a field on the hazardPage document type called "icon", of type Umbraco.TextArea. It seems that Lucene doesn't like this being >32766 characters... and one of the pages has 68162 characters in there.

    Is this technically a bug? Should Umbraco default to putting a 32766 character limit on a TextArea if it breaks Lucene?

    But apart from that, the problem is solved for now :)

Please Sign in or register to post replies

Write your reply to:

Draft