Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Simon Dingley 1470 posts 3427 karma points c-trib
    Aug 19, 2022 @ 08:57
    Simon Dingley
    0

    Search for and boost a specific stop word in Examine

    Is it possible to search for (and boost) a stop word with Examine? This is an exception rather than the rule for the site search so switching analysers is not an ideal solution.

    The customer happens to need to increase the visibility of a particular term relating to wills but I believe "will" is a stop word and so we're not seeing the results we expect.

    We have a field in Umbraco to allow keywords to be entered to boost certain page results so if a page has any words specified in that field they each get a boost, however, that does not appear to be the case with the stop word. I understand why but wondered if anyone has any ideas on how to get around it for a single term.

    In case it's relevant this is currently a v7 site.

  • Simon Dingley 1470 posts 3427 karma points c-trib
    Sep 05, 2022 @ 10:35
    Simon Dingley
    100

    After spending quite some time investigating this I discovered that you could not simply alter the stop word list (STOP_WORDS_SET) used by the StandardAnalyzer because when you inspect it, instead of returning a Hashtable it returns what is essentially a read-only representation of the Hashtable called an unmodifiableSet.

    It turns out that not a lot of code is required to overcome this so I created a new analyzer that inherits from StandardAnalyzer and injected a modified stop word list into the base constructor. In case it is of use to anyone else here is what I've done.

    using System.Collections;
    using System.Linq;
    using Lucene.Net.Analysis;
    using Lucene.Net.Analysis.Standard;
    
    namespace My.Namespace
    {
        public class MyStandardAnalyser : StandardAnalyzer
        {
            public MyStandardAnalyser() : base(Lucene.Net.Util.Version.LUCENE_29, MyStandardAnalyser.ENGLISH_STOP_WORDS_SET)
            {
    
            }
    
            private static Hashtable ENGLISH_STOP_WORDS_SET  {
                get
                {
                    var stopWords = Lucene.Net.Analysis.Standard.StandardAnalyzer.STOP_WORDS;
                    var set = stopWords.Where(w => w.ToLower() != "will").ToArray();
                    var charSet = new CharArraySet(set, true);
                    return CharArraySet.UnmodifiableSet(charSet);
                }
            }
        }
    }
    

    I then amended the ExamineSettings.config file so that the ExternalIndexer & ExternalSearcher uses the new Analyzer.

Please Sign in or register to post replies

Write your reply to:

Draft