Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Bala Gudibandla 12 posts 131 karma points
    Jun 15, 2017 @ 20:40
    Bala Gudibandla
    0

    Fuzzy search with SnowballAnalyzer

    I've been working on building user friendly search functionality on a website and trying to implement Fuzzy match that works with SnowballAnalyzer. But SnowballAnalyzer isn't working if the term appended with '~'.

    Lucene's Query

    -hideFromSearch:1 +(seoMetaKeywords:patrner~0.6 pageName:patrner~0.6 bodyText:patrner~0.6 richText:patrner~0.6 FileTextContent:patrner~0.6 ) +(seoMetaKeywords:pharmacies~0.6 pageName:pharmacies~0.6 bodyText:pharmacies~0.6 richText:pharmacies~0.6 FileTextContent:pharmacies~0.6 ) 
    

    Search code:

    var model = new SearchViewModel
        {
            SearchTerm = EscapeSearchTerm(CleanseSearchTerm(("" + Request["q"]).ToLower(CultureInfo.InvariantCulture))),
            CurrentPage = int.TryParse(Request["p"], out parsedInt) ? parsedInt : 1,
    
            PageSize = GetMacroParam(Model, "pageSize", s => int.Parse(s), 10),
            RootContentNodeId = GetMacroParam(Model, "rootContentNodeId", s => int.Parse(s), -1),
            RootMediaNodeId = GetMacroParam(Model, "rootMediaNodeId", s => int.Parse(s), -1),
            IndexType = GetMacroParam(Model, "indexType", s => s.ToLower(CultureInfo.InvariantCulture), ""),
            SearchFields = GetMacroParam(Model, "searchFields", s => SplitToList(s), new List<string> { "nodeName", "metaTitle", "metaDescription", "metaKeywords", "bodyText" }),
            PreviewFields = GetMacroParam(Model, "previewFields", s => SplitToList(s), new List<string> { "bodyText" }),
            PreviewLength = GetMacroParam(Model, "previewLength", s => int.Parse(s), 250),
            HideFromSearchField = GetMacroParam(Model, "hideFromSearchField", "umbracoNaviHide"),
            SearchFormLocation = GetMacroParam(Model, "searchFormLocation", s => s.ToLower(), "bottom")
        };
    
        // Validate values
        if (model.IndexType != UmbracoExamine.IndexTypes.Content &&
            model.IndexType != UmbracoExamine.IndexTypes.Media)
        {
            model.IndexType = "";
        }
    
        if (model.SearchFormLocation != "top"
            && model.SearchFormLocation != "bottom"
            && model.SearchFormLocation != "both"
            && model.SearchFormLocation != "none")
        {
            model.SearchFormLocation = "bottom";
        }
    
        // ====================================================
        // Comment the next if statement out if you want a root
        // node id of -1 to search content across all sites
        // and not just the current site.
        // ====================================================
        if (model.RootContentNodeId <= 0)
        {
            model.RootContentNodeId = Model.Content.AncestorOrSelf(1).Id;
        }
    
        // If searching on umbracoFile, also search on umbracoFileName
        if (model.SearchFields.Contains("umbracoFile") && !model.SearchFields.Contains("umbracoFileName"))
        {
            model.SearchFields.Add("umbracoFileName");
        }
    
        // Check the search term isn't empty
        if(!string.IsNullOrWhiteSpace(model.SearchTerm))
        {
            // Tokenize the search term
            model.SearchTerms = Tokenize(model.SearchTerm);
    
            // Perform the search
            var searcher = ExamineManager.Instance.SearchProviderCollection["ContentSearcher"];
            var criteria = searcher.CreateSearchCriteria();
            var query = new StringBuilder();
            query.AppendFormat("-{0}:1 ", model.HideFromSearchField);
    
            // Set search path
            var contentPathFilter = model.RootContentNodeId > 0
                ? string.Format("__IndexType:{0} +searchPath:{1} -template:0", UmbracoExamine.IndexTypes.Content, model.RootContentNodeId)
                : string.Format("__IndexType:{0} -template:0", UmbracoExamine.IndexTypes.Content);
    
            // Ensure page contains all search terms in some way
            foreach (var term in model.SearchTerms)
            {
                var groupedOr = new StringBuilder();
                foreach (var searchField in model.SearchFields)
                {
                    groupedOr.AppendFormat("{0}:{1}~ ", searchField, term);
                }
                query.Append("+(" + groupedOr.ToString() + ") ");
            }
    
            var criteria2 = criteria.RawQuery(query.ToString());
    
            var results = searcher.Search(criteria2).ToList();
    

    ExamineSettings.config:

    <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
                     analyzer="Our.Umbraco.ezSearch.SnowballAnalyzerEnglish, Our_Umbraco"/>
    
    <add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"
                     analyzer="Our.Umbraco.ezSearch.SnowballAnalyzerEnglish, Our_Umbraco"
                     extensions=".pdf"
                     umbracoFileProperty="umbracoFile"/>
    
    
    <!--Searcher-->
    <add name="ContentSearcher" type="Examine.LuceneEngine.Providers.MultiIndexSearcher, Examine" analyzer="Our.Umbraco.ezSearch.SnowballAnalyzerEnglish, Our_Umbraco"
                  enableLeadingWildcards="true" indexSets="ExternalIndexSet,PDFIndexSet"/>
    

    SnowballAnalyzer is working without the Fuzzy search symbol in query (~), but I want to get it worked with ~

    Thanks in advance for the help :)

  • Damiaan 442 posts 1302 karma points MVP 6x c-trib
    Jun 19, 2017 @ 09:15
    Damiaan
    0

    I guess the analyzer does not support fuzzy search?

    Where did you found the analyzer?

  • Bala Gudibandla 12 posts 131 karma points
    Jun 19, 2017 @ 13:47
    Bala Gudibandla
    0

    I found the analyzer at: https://our.umbraco.org/forum/umbraco-7/using-umbraco-7/61148-Search-not-returning-expected-results#comment-230133

    Fuzzy query needs to have ~ symbol at the end of each search keyword, but SnowballAnalyzer (for stemming) isn't identifying it as an English word.

    If ~ is removed, stemming is working, but Fuzzy search isn't.

Please Sign in or register to post replies

Write your reply to:

Draft