Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Squazz 35 posts 111 karma points
    Sep 30, 2015 @ 13:29
    Squazz
    0

    MultiIndexSearcher: Using custom stopwords

    I have my CombinedIndexSearcher set up this way:

    <add name="CombinedIndexSearcher" type="Examine.LuceneEngine.Providers.MultiIndexSearcher, Examine" 
                analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
                indexSets="UserIndexSet,ExternalIndexSet" 
                enableLeadingWildcard="true"/>
    

    To test if manually setting my stopwords works, I have tried to remove all of them like this:

    protected override void OnApplicationStarted(object sender, EventArgs e)
    {
       Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET = new System.Collections.Hashtable();
    }
    

    Then I have created my own searcher-method where I start out my search by instanciating my searcher like this:

    var searcher = (MultiIndexSearcher)ExamineManager.Instance.SearchProviderCollection["CombinedIndexSearcher"];
    

    When I now look at my newly created searcher, it tells me that it consists of a StandardAnalyzer, as I have set it to be in my configuration. But it also has 33 elements in the stopSet property. Now this part is quite alarming for me.

    Is there anything I'm doing wrong? Does anyone have any input for why I am getting any stopwords when I have emptied StopAnalyzer.ENGLISHSTOPWORDS_SET

    Any input is great, I'm kinda stuck atm :/

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Nov 17, 2015 @ 13:48
    Ismail Mayat
    0

    Squazz,

    You have set stopanalyzer stopwords however the standardanalyser stop words are still present?

    I suspect you may need a new analyser without stopwords or set the standardanalyser stop words to nothing however this may have other un intended consequences?

    I think you need new analyser for your custom searcher based on standard but with out stop words?

    Regards

    Ismail

  • Squazz 35 posts 111 karma points
    Nov 17, 2015 @ 13:57
    Squazz
    0

    Ismail

    You have set stopanalyzer stopwords however the standardanalyser stop words are still present?

    Yes, that is true

    I think you need new analyser for your custom searcher based on standard but with out stop words?

    This might very well be the solution that I'm looking for. But I thought I had already created that / modified the existing standard analyser to do exactly this?

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Nov 17, 2015 @ 14:27
    Ismail Mayat
    0

    You said you have modified standard analyser, however in your config you have

    analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer

    Should that not be set to your custom one?

    Regards

    Ismail

  • Squazz 35 posts 111 karma points
    Nov 17, 2015 @ 14:51
    Squazz
    0

    That's right, I'm using the StandardAnalyzer, but I have also used this code:

    protected override void OnApplicationStarted(object sender, EventArgs e)
    {
       Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET = new System.Collections.Hashtable();
    }
    

    Where I remove all of the stop words

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Nov 17, 2015 @ 14:57
    Ismail Mayat
    0

    That is on stopanlayser not standard? Unless standard inherits from stop? Also i do not think that will work you need to actually create a new analyser and set stop words there. If you look on forum i do re call someone doing this a couple of years ago. The post had code as well.

    Regards

    Ismail

  • Squazz 35 posts 111 karma points
    Nov 17, 2015 @ 15:05
    Squazz
    0

    According to the docs for Lucene.Net then yes, StandardAnalyzer inherits from stop.

    static StandardAnalyzer()
    {
        STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
    }
    

    I might have to try out creating my own analyzer

  • Mikkel Johansen 116 posts 292 karma points
    Nov 17, 2015 @ 14:56
    Mikkel Johansen
    0

    I am trying the exact same thing. Right now I am trying to make my own StandardAnalyzer with an empty list of stop-words.

    And it seems to work :-)

    Web.config

    <add name="UserIndexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine"
       dataService="Service.SearchHandler.Indexes.UserIndex, Service"
       indexTypes="Users" runAsync="true" indexSet="UserIndexSet"
       analyzer="Service.SearchHandler.MyAnalyzer, Service" />
    

    MyAnalyzer.cs

    using System.Collections;
    using Lucene.Net.Analysis.Standard;
    using Version = Lucene.Net.Util.Version;
    
    namespace Service.SearchHandler
    {
        public class MyAnalyzer : StandardAnalyzer
        {
            public MyAnalyzer() : this(new Hashtable())
            {
            }
    
            public MyAnalyzer(Hashtable stopwords) : base(Version.LUCENE_29, stopwords)
            {
            }
        }
    }
    
  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Nov 17, 2015 @ 15:00
  • Mikkel Johansen 116 posts 292 karma points
    Nov 17, 2015 @ 15:09
    Mikkel Johansen
    0

    That was the link I have been looking for all day long :-)

    But me and my colleague is now a tiny little wiser on Lucene.Net.

    -Mikkel

Please Sign in or register to post replies

Write your reply to:

Draft