Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Ryan Rueckl 17 posts 74 karma points
    Feb 06, 2015 @ 18:11
    Ryan Rueckl
    0

    Search not returning expected results

    I'm using Umbrraco 7.2.1 on .Net 4.5.2 on a Windows 7 machine. I set up a basic search index and provider using the standard analyzer. I did not set any specific index fields so that it will just index everything. When I perform a basic search on the singular and plural versions of a word, I get different results. But the weird part is that I get less results with the singular version, which seems weird to me. E.g. searching on 'solutions' gives me 11 results but just 'solution' only gives me 2. Seems like it must be doing an exact word search or something, but I thought that the standard analyzer would work for finding all or part of a word. My query and config is below. Should I be using a different analyzer or something?

    When I use the Examine Management dashboard in the backoffice I can reproduce this behavior not just in my custom index, but also in the InternalIndex/Searcher.

    var searcher = Examine.ExamineManager.Instance.SearchProviderCollection["SearchSearcher"];
    var searchResults = searcher.Search(searchTerm, false).OrderByDescending(x => x.Score);
    
    
    <IndexSet SetName="SearchIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Search/">
      <ExcludeNodeTypes>
        <add Name="Image" />
        <add Name="Folder" />
        <add Name="Homepage" />
        <add Name="SettingsFolder" />
        <add Name="GlobalSettings" />
        <add Name="Navigation" />
      </ExcludeNodeTypes>
    </IndexSet>
    
    <add name="SearchIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
      supportUnpublished="false"
      supportProtected="true"
      interval="30"
      analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
      indexSet="SearchIndexSet" />
    
    <add name="SearchSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
      analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true" indexSet="SearchIndexSet" />
    
  • Valerie 67 posts 163 karma points
    Feb 09, 2015 @ 10:29
    Valerie
    0

    As far as I know the Standard Analyser only breaks up the search query into terms, without any stemming or altering the original term. The number of results is based on the exact matching terms. I've never used it myself but I think SnowballAnalyzer can achieve this?

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Feb 09, 2015 @ 11:11
    Ismail Mayat
    100

    Ryan,

    Wildcard your searchTerm so searchTerm.MultipleCharacterWildcard() its an extension method see if that helps, if you need stemming you will need to use snowball

    Regards

    Ismial

  • Ryan Rueckl 17 posts 74 karma points
    Feb 10, 2015 @ 18:09
    Ryan Rueckl
    0

    I was trying to do a basic search (on all fields since there aren't many). I couldn't get the wildcard search working because it wanted me to supply the field(s) to search on, so I just concatenated the data in the GatheringNodeData event and searched on that field and it works now.

    I was not able to get the snowball analyzer working - couldn't get it configured correctly, and couldn't find much documentation on how to properly configure it to use in Umbraco. I will likely look more into it in another project.

    It would definitely be useful to see some documentation/examples of how other people have implemented their own search in Umbraco without having to spend a lot of time poring through forum posts. Wishful thinking I guess.

    Thanks.

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Feb 10, 2015 @ 18:28
    Ismail Mayat
    0

    Ryan,

    I have used snowball but with older umbraco versions pre examine in this instance we have examine as a wrapper around things so not sure how you would use snowball unless before you query you run your term through snowball?

    Regards

    Ismial

  • Valerie 67 posts 163 karma points
    Feb 11, 2015 @ 09:26
    Valerie
    0

    Yeah documentation is a real issue with the search stuff!

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Feb 11, 2015 @ 11:02
    Ismail Mayat
    1

    Valerie,

    With regards to docs i would recommend:

    https://our.umbraco.org/Documentation/Reference/Searching/Examine/index

    https://www.youtube.com/watch?v=6AMb0rrSrJw

    http://thecogworks.co.uk/blog/posts/2012/november/examiness-hints-and-tips-from-the-trenches-part-1/ (there are 10 parts they expand on topics in my presentation)

    Work throught that order and you will be examine guru in no time ;-}

  • Valerie 67 posts 163 karma points
    Feb 11, 2015 @ 11:27
    Valerie
    0

    Is there anything out there in terms of setting up Facets? I saw that Shannon had a branch on the go at some point but it doesn't seem to have been finished.

    Edit: Nevermind, just saw it's covered in your video! Thanks for this :D I've been struggling with them for a while.

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Feb 11, 2015 @ 11:56
    Ismail Mayat
    0

    Not used anything in shannons branch, have played with Bobo and there is really good article here from 24 days http://24days.in/umbraco/2014/search-with-bobo/

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Feb 11, 2015 @ 12:24
    Ismail Mayat
    0

    Ryan,

    Did some reading, you could download the snowball analyser from lucene.net contrib and build then in examie config for the indexer / searcher replace analyser with snowball then on indexing it will stem and searching will run query term through stemming as well, not tried this but reckon it should work.

    Regards

    Ismail

  • Mark 255 posts 612 karma points
    Jun 23, 2015 @ 10:12
    Mark
    0

    Hi Ismail. I'm considering using the Snowball analyzer. But I'm not clear how I'd specify the "English" version, and also the stop words, in the examine config file. The constructor mentions that these need to be specified; e.g. the stop words will be empty otherwise. Perhaps the analyzer can be added in code.

  • Valerie 67 posts 163 karma points
    Feb 12, 2015 @ 13:59
    Valerie
    0

    I managed to get my results to behave like this by adding "fuzzy" to the criteria (I wanted "politic" to return "politics and "political")

    https://gist.github.com/anonymous/6076f3fdd1e6e3f66b1d

    Like here http://umbraco.com/follow-us/blog-archive/2011/9/16/examining-examine

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Jun 23, 2015 @ 13:05
    Ismail Mayat
    1

    Mark,

    you will need to create your own analyzer that inherits from snowball and hardcode in the constructor the stop words and analyser to use then pass that up to snowball and in examine config instead of using snowball directly use your class?

    Regards

    Ismail

  • Mark 255 posts 612 karma points
    Jun 23, 2015 @ 13:09
    Mark
    0

    Thanks Ismail. That makes obvious sense. One other query I had was around the stop words. Is there a standard array of English stop words within Lucene.Net that I can use? Or will I need to generate my own array?

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Jun 23, 2015 @ 15:38
    Ismail Mayat
    1

    Mark,

    You can get them directly from lucene.net standard analyser class its exposed as a property see here so you should be able todo

    Lucene.Net.Analysis.Core.StopAnalyzer.ENGLISH_STOP_WORDS_SET
    

    It returns CharArraySet so you may have to fiddle with that to get list string unless snowball will take CharArraySet

  • Mark 255 posts 612 karma points
    Sep 22, 2015 @ 14:04
    Mark
    0

    Hi Ismail,

    I'm starting on the development work for this today, but have another question. The version of Lucene.Net in Umbraco 6.1.4 is 2.9.4 (file version 2.9.4.1). The only contrib version I can find is 3.0.3, which requires Lucene.Net 3.0.3.

    Do you know if there will be any issues in updating the Lucene.Net dll to version 3.0.3? Or am I safer to try and find the 2.9.4 compatible version of contrib? Do you know where I can find it?

    Thanks, Mark

  • Ismail Mayat 4511 posts 10091 karma points MVP 2x admin c-trib
    Sep 22, 2015 @ 14:58
    Ismail Mayat
    0

    Mark,

    Any reason why you want latest version of lucene.net? I suspect Examine would probably break if you replaced the version of lucene.net its been built to.

    Regards

    Ismail

  • Mark 255 posts 612 karma points
    Sep 22, 2015 @ 15:56
    Mark
    0

    Hi Ismail,

    That's what I was asking. I don't want to use the latest version of lucene (3.0.3), I wan't to use version 2.9.4 (used by Umbraco 6.1.4). The problem is, I can't find a downloadable verion of contrib 2.9.4. All I can find is contrib version 3.0.3, but that requires lucene.net 3.0.3. I'm trying to avoid the situation you're describing. :-)

    The thing is, I haven't had cause to download lucene before, and I'm struggling to find it.

    Regards, Mark

  • Mark 255 posts 612 karma points
    Sep 22, 2015 @ 16:37
    Mark
    0

    Eureka! I've found it on Nuget:

    Install-Package Lucene.Net.Contrib -Version 2.9.4.1

    For some reason I just can't seem to find what I'm looking for here:

    http://lucenenet.apache.org/

    I end up trawling through subversion web pages that don't provide the ability to download anything. I'm obviously missing something!

    Anyway, thanks for all your help, Ismail

  • Mark 255 posts 612 karma points
    Sep 23, 2015 @ 05:34
    Mark
    2

    For anyone uncertain how to implement the Snowball Analyzer, this is what I did:

    Note: Umbraco version 6.1.4. For other versions, I would check you are using the correct version of Lucene.Net.Contrib.

    Created a new class library in Visual Studio called LuceneNetContrib.

    Installed nuget package:

    Install-Package Lucene.Net.Contrib -Version 2.9.4.1
    

    Added the following class to the class library:

    namespace LuceneNetContrib
    {
        public class EnglishSnowballAnalyzer : Lucene.Net.Analysis.Snowball.SnowballAnalyzer
        {
            public EnglishSnowballAnalyzer() : base("English", Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS)
            { }
        }
    }
    

    Compiled the project and copied "Lucene.Net.Contrib.Snowball.dll" and "LuceneNetContrib.dll" to the Umbraco /bin/ folder.

    Then, in Umbraco opened "/Config/ExamineSettings.config" and specified the analyzer. So:

    <add name="SiteSearchContentIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
                 supportUnpublished="false"
                 supportProtected="true" />
    

    Becomes:

    <add name="SiteSearchContentIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
                 supportUnpublished="false"
                 supportProtected="true"
    analyzer="LuceneNetContrib.EnglishSnowballAnalyzer, LuceneNetContrib" />
    

    Do the same thing for the searcher, so:

    <add name="SiteSearchContentSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
    

    Becomes

    <add name="SiteSearchContentSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
    analyzer="LuceneNetContrib.EnglishSnowballAnalyzer, LuceneNetContrib" />
    

    Finally, rebuilt the index via the CMS (Developer section, Examine Management tab) and tested. All working as expected!

Please Sign in or register to post replies

Write your reply to:

Draft