Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Tomasz 22 posts 96 karma points
    Oct 10, 2019 @ 03:33
    Tomasz
    0

    Examine wildcard searches and no documentation or samples

    Hi,

    I am struggling to make sense of a few things without proper examples / docs and can't see anyone else having the same, so here goes.

    I have an Umbraco API Controller I am trying to use for search, the action takes a "search term" that gets passed into a query.

            var indexer = _examineManager.Indexes.FirstOrDefault(w => w.Name.Equals("ExternalIndex"));
            if (indexer == null) return model;
    
            var searcher = indexer.GetSearcher();
    
            var resultsByNodeName = searcher.CreateQuery("content")
                //.NodeTypeAlias("content").And().GroupedOr(_contentFieldsToSearch, q)
    
                .NativeQuery(GetNativeLuceneQuery(q, string.Empty, _contentFieldsToSearch))
                .Execute(limit) // max limit of results
                .OrderByDescending(o => o.Score);
    

    What I am trying to achieve? Searching for "well" or "wellbeing", in the Backoffice Examine Manager returns 3 results, BUT my API is not finding ANY results to match (apart for other odd results).

    backoffice search results

    First solution was to use "plain": using .NodeTypeAlias("content").And().GroupedOr(_contentFieldsToSearch, q) to create and execute query, however that does not deal with "wildcard" matches, etc. so is useless for me at this stage.

    Second solution (as in this sample):

    I have written a little helper GetNativeLuceneQuery to try to "reason" as to why on earth does this not return the same exact set of results as the interface in the Backoffice; based off https://github.com/umbraco/Umbraco-CMS/blob/b260f6c6dedc6c8c123de5f2348b4d7f0dcb51bd/src/Umbraco.Web/Editors/EntityController.cs#L244

    private string GetNativeLuceneQuery(string query, string type, string[] fields)
        {
            if (string.IsNullOrWhiteSpace(query)) return string.Empty;
    
            if (string.IsNullOrWhiteSpace(query.Trim(new[] { '\"', '\'' }))) return string.Empty;
    
            //build a lucene query:
            // the __nodeName will be boosted 10x without wildcards
            // then __nodeName will be matched normally with wildcards
            // the rest will be normal without wildcards
            var nativeQuery = string.Empty;
    
            //check if text is surrounded by single or double quotes, if so, then exact match
            var surroundedByQuotes = Regex.IsMatch(query, "^\".*?\"$") || Regex.IsMatch(query, "^\'.*?\'$");
            if (surroundedByQuotes)
            {
                //strip quotes, escape string, the replace again
                query = query.Trim(new[] { '\"', '\'' });
                query = Lucene.Net.QueryParsers.QueryParser.Escape(query);
                if (string.IsNullOrWhiteSpace(query)) return string.Empty;
    
                //add back the surrounding quotes
                query = string.Format("{0}{1}{0}", "\"", query);
    
                //node name exactly boost x 10
                nativeQuery += $"+(__nodeName: ({query.ToLower()})^10.0 ";
    
                //additional fields normally
                nativeQuery += fields.Aggregate(nativeQuery, (current, f) => current + $"{f}: ({query}) ");
            }
            else
            {
                query = Lucene.Net.QueryParsers.QueryParser.Escape(query);
    
                var querywords = query.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
    
                //node name exactly boost x 10
                nativeQuery += $"+(__nodeName:\"{query.ToLower()}\"^10.0 ";
    
                //node name normally with wildcards
                nativeQuery += " __nodeName:(";
                nativeQuery = querywords.Aggregate(nativeQuery, (current, w) => current + $"{w.ToLower()}* ").Trim();
                nativeQuery += ") ";
    
                foreach (var f in fields)
                {
                    //additional fields normally
                    nativeQuery += $"{f}:(";
                    nativeQuery = querywords.Aggregate(nativeQuery, (current, w) => current + $"{w.ToLower()}* ").Trim();
                    nativeQuery += $") ";
                }
            }
    
    
            nativeQuery += string.IsNullOrEmpty(type) ? $") +__IndexType:content" : $") +__IndexType:{type}";
    
            nativeQuery += $" +excludeFromSearch:(0) "; // only search if not excluded from search
    
            return nativeQuery;
        }
    

    API returns 0 for both "well" or "wellbeing", Lucene queries as follows:

    +(__nodeName:"well"^10.0  __nodeName:(well*) searchKeywords:(well*) grid:(well*) ) +__IndexType:content +excludeFromSearch:(0) 
    
    +(__nodeName:"wellbeing"^10.0  __nodeName:(wellbeing*) searchKeywords:(wellbeing*) grid:(wellbeing*) ) +__IndexType:content +excludeFromSearch:(0) 
    

    Absurd is that if I search Backoffice for "abs", expecting to get at least 1 document, with node name of "Absence / Leave Information", I get NONE results, while the API returns what I am expecting to get...

    Please help.

  • Tomasz 22 posts 96 karma points
    Oct 10, 2019 @ 04:01
    Tomasz
    0

    I've just came across this and it solved my "searching" needs.

    https://our.umbraco.com/forum/umbraco-8/97707-examine-leading-wildcard#comment-309422

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies