Search for and boost a specific stop word in Examine
Is it possible to search for (and boost) a stop word with Examine? This is an exception rather than the rule for the site search so switching analysers is not an ideal solution.
The customer happens to need to increase the visibility of a particular term relating to wills but I believe "will" is a stop word and so we're not seeing the results we expect.
We have a field in Umbraco to allow keywords to be entered to boost certain page results so if a page has any words specified in that field they each get a boost, however, that does not appear to be the case with the stop word. I understand why but wondered if anyone has any ideas on how to get around it for a single term.
In case it's relevant this is currently a v7 site.
After spending quite some time investigating this I discovered that you could not simply alter the stop word list (STOP_WORDS_SET) used by the StandardAnalyzer because when you inspect it, instead of returning a Hashtable it returns what is essentially a read-only representation of the Hashtable called an unmodifiableSet.
It turns out that not a lot of code is required to overcome this so I created a new analyzer that inherits from StandardAnalyzer and injected a modified stop word list into the base constructor. In case it is of use to anyone else here is what I've done.
using System.Collections;
using System.Linq;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
namespace My.Namespace
{
public class MyStandardAnalyser : StandardAnalyzer
{
public MyStandardAnalyser() : base(Lucene.Net.Util.Version.LUCENE_29, MyStandardAnalyser.ENGLISH_STOP_WORDS_SET)
{
}
private static Hashtable ENGLISH_STOP_WORDS_SET {
get
{
var stopWords = Lucene.Net.Analysis.Standard.StandardAnalyzer.STOP_WORDS;
var set = stopWords.Where(w => w.ToLower() != "will").ToArray();
var charSet = new CharArraySet(set, true);
return CharArraySet.UnmodifiableSet(charSet);
}
}
}
}
I then amended the ExamineSettings.config file so that the ExternalIndexer & ExternalSearcher uses the new Analyzer.
Search for and boost a specific stop word in Examine
Is it possible to search for (and boost) a stop word with Examine? This is an exception rather than the rule for the site search so switching analysers is not an ideal solution.
The customer happens to need to increase the visibility of a particular term relating to wills but I believe "will" is a stop word and so we're not seeing the results we expect.
We have a field in Umbraco to allow keywords to be entered to boost certain page results so if a page has any words specified in that field they each get a boost, however, that does not appear to be the case with the stop word. I understand why but wondered if anyone has any ideas on how to get around it for a single term.
In case it's relevant this is currently a v7 site.
After spending quite some time investigating this I discovered that you could not simply alter the stop word list (
STOP_WORDS_SET
) used by theStandardAnalyzer
because when you inspect it, instead of returning a Hashtable it returns what is essentially a read-only representation of the Hashtable called an unmodifiableSet.It turns out that not a lot of code is required to overcome this so I created a new analyzer that inherits from
StandardAnalyzer
and injected a modified stop word list into the base constructor. In case it is of use to anyone else here is what I've done.I then amended the ExamineSettings.config file so that the
ExternalIndexer
&ExternalSearcher
uses the new Analyzer.is working on a reply...