I've noticed when conducting an Examine search that words that appear in the main rich-text content weren't being matched in some instances. When I looked at the raw index I could see that Examine has stripped out HTML tags (which makes sense) but where this has happened it hasn't left any whitespace in their place. This causes adjacent words to run into each other in the index.
To illustrate, take this original HTML content in the RTE:
<p><b>Knowledge, Skills & Abilities</b></p><ul><li><p>Excellent attention to detail, and ability to create wire-frames, user flows, mock-ups, interactive design and prototypes.</p></li><li><p>Excellent understanding of information architecture.</p></li><li></ul>
When Examine indexes it, it is converted to plain-text like this:
Knowledge, Skills & AbilitiesExcellent attention to detail, and ability to create wire-frames, user flows, mock-ups, interactive design and prototypes.Excellent understanding of information architecture.
You can see in the Examine index that where tags were removed that words run together, such as "AbilitiesExcellent" or "prototypes.Excellent" and this then messes up matching if you search for, say, "prototypes".
So is there a way of getting it to replace tags with whitespace when indexing, to preserve words?
How Examine Indexes HTML (Spacing issue)
I've noticed when conducting an Examine search that words that appear in the main rich-text content weren't being matched in some instances. When I looked at the raw index I could see that Examine has stripped out HTML tags (which makes sense) but where this has happened it hasn't left any whitespace in their place. This causes adjacent words to run into each other in the index.
To illustrate, take this original HTML content in the RTE:
When Examine indexes it, it is converted to plain-text like this:
You can see in the Examine index that where tags were removed that words run together, such as "AbilitiesExcellent" or "prototypes.Excellent" and this then messes up matching if you search for, say, "prototypes".
So is there a way of getting it to replace tags with whitespace when indexing, to preserve words?
You could choose another analyzer/tokenizer for the fields in question.
See example number two under Overriding index creation here:
https://our.umbraco.com/documentation/Reference/Searching/Examine/indexing/
I had the docs open and have just written a thing to index paths in parts, so it's was a bit lucky I can just gingerly paste this example. 😆
You could choose another analyzer/tokenizer for the fields in question.
See example number two under Overriding index creation here:
https://our.umbraco.com/documentation/Reference/Searching/Examine/indexing/
I had the docs open and have just written a thing to index paths in parts, so it's was a bit lucky I can just gingerly paste this example. 😆
is working on a reply...