Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Tomasz 19 posts 91 karma points
    Oct 10, 2019 @ 03:33
    Tomasz
    0

    Examine wildcard searches and no documentation or samples

    Hi,

    I am struggling to make sense of a few things without proper examples / docs and can't see anyone else having the same, so here goes.

    I have an Umbraco API Controller I am trying to use for search, the action takes a "search term" that gets passed into a query.

            var indexer = _examineManager.Indexes.FirstOrDefault(w => w.Name.Equals("ExternalIndex"));
            if (indexer == null) return model;
    
            var searcher = indexer.GetSearcher();
    
            var resultsByNodeName = searcher.CreateQuery("content")
                //.NodeTypeAlias("content").And().GroupedOr(_contentFieldsToSearch, q)
    
                .NativeQuery(GetNativeLuceneQuery(q, string.Empty, _contentFieldsToSearch))
                .Execute(limit) // max limit of results
                .OrderByDescending(o => o.Score);
    

    What I am trying to achieve? Searching for "well" or "wellbeing", in the Backoffice Examine Manager returns 3 results, BUT my API is not finding ANY results to match (apart for other odd results).

    backoffice search results

    First solution was to use "plain": using .NodeTypeAlias("content").And().GroupedOr(_contentFieldsToSearch, q) to create and execute query, however that does not deal with "wildcard" matches, etc. so is useless for me at this stage.

    Second solution (as in this sample):

    I have written a little helper GetNativeLuceneQuery to try to "reason" as to why on earth does this not return the same exact set of results as the interface in the Backoffice; based off https://github.com/umbraco/Umbraco-CMS/blob/b260f6c6dedc6c8c123de5f2348b4d7f0dcb51bd/src/Umbraco.Web/Editors/EntityController.cs#L244

    private string GetNativeLuceneQuery(string query, string type, string[] fields)
        {
            if (string.IsNullOrWhiteSpace(query)) return string.Empty;
    
            if (string.IsNullOrWhiteSpace(query.Trim(new[] { '\"', '\'' }))) return string.Empty;
    
            //build a lucene query:
            // the __nodeName will be boosted 10x without wildcards
            // then __nodeName will be matched normally with wildcards
            // the rest will be normal without wildcards
            var nativeQuery = string.Empty;
    
            //check if text is surrounded by single or double quotes, if so, then exact match
            var surroundedByQuotes = Regex.IsMatch(query, "^\".*?\"$") || Regex.IsMatch(query, "^\'.*?\'$");
            if (surroundedByQuotes)
            {
                //strip quotes, escape string, the replace again
                query = query.Trim(new[] { '\"', '\'' });
                query = Lucene.Net.QueryParsers.QueryParser.Escape(query);
                if (string.IsNullOrWhiteSpace(query)) return string.Empty;
    
                //add back the surrounding quotes
                query = string.Format("{0}{1}{0}", "\"", query);
    
                //node name exactly boost x 10
                nativeQuery += $"+(__nodeName: ({query.ToLower()})^10.0 ";
    
                //additional fields normally
                nativeQuery += fields.Aggregate(nativeQuery, (current, f) => current + $"{f}: ({query}) ");
            }
            else
            {
                query = Lucene.Net.QueryParsers.QueryParser.Escape(query);
    
                var querywords = query.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
    
                //node name exactly boost x 10
                nativeQuery += $"+(__nodeName:\"{query.ToLower()}\"^10.0 ";
    
                //node name normally with wildcards
                nativeQuery += " __nodeName:(";
                nativeQuery = querywords.Aggregate(nativeQuery, (current, w) => current + $"{w.ToLower()}* ").Trim();
                nativeQuery += ") ";
    
                foreach (var f in fields)
                {
                    //additional fields normally
                    nativeQuery += $"{f}:(";
                    nativeQuery = querywords.Aggregate(nativeQuery, (current, w) => current + $"{w.ToLower()}* ").Trim();
                    nativeQuery += $") ";
                }
            }
    
    
            nativeQuery += string.IsNullOrEmpty(type) ? $") +__IndexType:content" : $") +__IndexType:{type}";
    
            nativeQuery += $" +excludeFromSearch:(0) "; // only search if not excluded from search
    
            return nativeQuery;
        }
    

    API returns 0 for both "well" or "wellbeing", Lucene queries as follows:

    +(__nodeName:"well"^10.0  __nodeName:(well*) searchKeywords:(well*) grid:(well*) ) +__IndexType:content +excludeFromSearch:(0) 
    
    +(__nodeName:"wellbeing"^10.0  __nodeName:(wellbeing*) searchKeywords:(wellbeing*) grid:(wellbeing*) ) +__IndexType:content +excludeFromSearch:(0) 
    

    Absurd is that if I search Backoffice for "abs", expecting to get at least 1 document, with node name of "Absence / Leave Information", I get NONE results, while the API returns what I am expecting to get...

    Please help.

  • Tomasz 19 posts 91 karma points
    Oct 10, 2019 @ 04:01
    Tomasz
    0

    I've just came across this and it solved my "searching" needs.

    https://our.umbraco.com/forum/umbraco-8/97707-examine-leading-wildcard#comment-309422

Please Sign in or register to post replies

Write your reply to:

Draft