Change how Umbraco/Examine Stores Rich Content Fields?
I have Examine up and running now.
It was probably working before, but I didn't realize the WhiteSpaceAnalyzer that is on by default is case-sensitive. Moved it over to StandardAnalyzer and it almost works the way I want.
The problem is that when rich content (WYSIWYG) content is saved, all the tags are stripped out but are not replaced with whitespace. The end result is that the words before and after tags are smushed together in the index making those words unsearchable.
Is there any way to change Umbraco so it inserts a space for every tag stripped out?
Not sure about replacing stripped tags with whitespace, but it is possible to force Examine to index the RTE content with tags in place, if that helps? The code below creates a field in the index containing the rich content from the bodyText field
namespace EventHandlers { public class EvendHandlers : ApplicationBase { public EventHandlers() { var indexer = (LuceneIndexer)ExamineManager.Instance.IndexProviderCollection["YourIndexName"]; indexer.GatheringNodeData += new EventHandler<IndexingNodeDataEventArgs>(indexer_GatheringNodeData); } void indexer_GatheringNodeData(object sender, IndexingNodeDataEventArgs e) { XElement node = e.Node; XElement elementBodyText = node.Element("bodyText"); if (elementBodyText != null) { e.Fields.Add("BodyTextWithTags", elementBodyText.Value); } } } }
Change how Umbraco/Examine Stores Rich Content Fields?
I have Examine up and running now.
It was probably working before, but I didn't realize the WhiteSpaceAnalyzer that is on by default is case-sensitive. Moved it over to StandardAnalyzer and it almost works the way I want.
The problem is that when rich content (WYSIWYG) content is saved, all the tags are stripped out but are not replaced with whitespace. The end result is that the words before and after tags are smushed together in the index making those words unsearchable.
Is there any way to change Umbraco so it inserts a space for every tag stripped out?
Also, is there a way to over-ride the stop words?
Hi Sam
Not sure about replacing stripped tags with whitespace, but it is possible to force Examine to index the RTE content with tags in place, if that helps? The code below creates a field in the index containing the rich content from the bodyText field
Sam,
With regards to stopwords see http://our.umbraco.org/forum/developers/extending-umbraco/25600-Examine-case-insensitive-keyword-search
Regards
Ismail
is working on a reply...