Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Cimplex 113 posts 576 karma points
    Mar 03, 2020 @ 20:05
    Cimplex
    1

    Hyphens in Lucene search (Examine)

    Hi! I have a Lucene index named DownloadIndex, the index have the following fields:

    • articleNumber
    • downloadType
    • documentName

    Most of the items in the index have a articleNumber that contains a hyphen for example: x-way

    The problem is when using the StandardAnalyzer and we perform a search for x-way we get 250 results:

    var searchResults = searcher.CreateQuery().Field("languageId", language.Id).And().Field("articleNumber", "x-way").Execute();
    

    Lucene Query:

    +(languageId:[4927 TO 4927]) +articleNumber:x 
    

    When analyzing the query you can see that everything after the hyphen is removed and the search is only "x" thats why we get 250 results.

    I've also tried to use the Whitespace analyzer, keyword analyzer but without luck.

    I also tried to perform a wildcard search resulting with the following query:

    +(languageId:[4927 TO 4927]) +articleNumber:x-way*
    

    But that doesn't help.

    Any ideas?

  • Marc Goodson 2155 posts 14408 karma points MVP 9x c-trib
    Mar 04, 2020 @ 19:33
    Marc Goodson
    1

    HI Cimplex

    Yes I think this is a thing with Lucene and hyphens and also underscores...

    Ran into the issue talking with Jeavon about his PR here:

    https://github.com/umbraco/Umbraco-CMS/pull/6579

    this involves searching for Guids in Umbraco that contained hyphens

    And Shannon found this stack overflow suggestion: https://stackoverflow.com/questions/16858880/java-lucene-search-query-hyphens-with-wildcards/

    in the end, the short term fix was to remove the hyphens from the data that was in the searchable feed when it was indexed, and then remove any hyphens from the search text.

    Essentially if you handle the TransformingIndexValue event:

    https://our.umbraco.com/Documentation/Reference/Searching/Examine/examine-events

    You could add a new field to the examine index called 'StrippedArticleNumber' which has any hyphens or underscores removed, so X-Way would be stored as XWay.... when somebody searches for 'X-Way' you would again strip the hyphen from the search term... and send 'XWay' instead, which hould match the article...

    regards

    Marc

Please Sign in or register to post replies

Write your reply to:

Draft