I've got a bit of a head-scratcher here. I'm running Umbraco 10.4, although I was having the same issue in 10.3.2.
I have a document type called hazardPage, but some pages simply aren't appearing in the Examine/Lucene index. If I go to Examine Management and search for __NodeTypeAlias:hazardPage then some pages show up but others don't.
It doesn't matter whether I search the internal or external index. If I use the "global" search in the top-right of Umbraco and type the name of one of the missing pages, it doesn't show up.
Rebuilding the index hasn't helped, and resaving/publishing the problematic pages hasn't helped. I'm not sure what else to try. Any ideas?
Edit: I shut down the site, deleted the index, and let it rebuild. Then I noticed this entry in the log:
System.ArgumentException: Document contains at least one immense term in field="__Icon" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22]...'
at Lucene.Net.Index.DocInverterPerField.ProcessFields(IIndexableField[] fields, Int32 count)
at Lucene.Net.Index.DocFieldProcessor.ProcessDocument(Builder fieldInfos)
at Lucene.Net.Index.DocumentsWriterPerThread.UpdateDocument(IEnumerable`1 doc, Analyzer analyzer, Term delTerm)
at Lucene.Net.Index.DocumentsWriter.UpdateDocument(IEnumerable`1 doc, Analyzer analyzer, Term delTerm)
at Lucene.Net.Index.IndexWriter.UpdateDocument(Term term, IEnumerable`1 doc, Analyzer analyzer)
at Lucene.Net.Index.IndexWriter.UpdateDocument(Term term, IEnumerable`1 doc)
at Lucene.Net.Index.TrackingIndexWriter.UpdateDocument(Term t, IEnumerable`1 d)
at Examine.Lucene.Providers.LuceneIndex.AddDocument(Document doc, ValueSet valueSet)
at Examine.Lucene.Providers.LuceneIndex.ProcessIndexQueueItem(IndexOperation op)
at Examine.Lucene.Providers.LuceneIndex.ProcessQueueItem(IndexOperation item)
at Examine.Lucene.Providers.LuceneIndex.PerformIndexItemsInternal(IEnumerable`1 valueSets, CancellationToken cancellationToken)
I'm guessing __Icon is a system field (with the double underscores) so is there some way I can figure out which page is causing the issue?
There's a field on the hazardPage document type called "icon", of type Umbraco.TextArea. It seems that Lucene doesn't like this being >32766 characters... and one of the pages has 68162 characters in there.
Is this technically a bug? Should Umbraco default to putting a 32766 character limit on a TextArea if it breaks Lucene?
But apart from that, the problem is solved for now :)
Some pages not going into Examine index
Hi everyone,
I've got a bit of a head-scratcher here. I'm running Umbraco 10.4, although I was having the same issue in 10.3.2.
I have a document type called hazardPage, but some pages simply aren't appearing in the Examine/Lucene index. If I go to Examine Management and search for __NodeTypeAlias:hazardPage then some pages show up but others don't.
It doesn't matter whether I search the internal or external index. If I use the "global" search in the top-right of Umbraco and type the name of one of the missing pages, it doesn't show up.
Rebuilding the index hasn't helped, and resaving/publishing the problematic pages hasn't helped. I'm not sure what else to try. Any ideas?
Edit: I shut down the site, deleted the index, and let it rebuild. Then I noticed this entry in the log:
I'm guessing __Icon is a system field (with the double underscores) so is there some way I can figure out which page is causing the issue?
I've figured it out.
There's a field on the hazardPage document type called "icon", of type Umbraco.TextArea. It seems that Lucene doesn't like this being >32766 characters... and one of the pages has 68162 characters in there.
Is this technically a bug? Should Umbraco default to putting a 32766 character limit on a TextArea if it breaks Lucene?
But apart from that, the problem is solved for now :)
is working on a reply...