Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • keilo 568 posts 1023 karma points
    Dec 02, 2014 @ 17:39
    keilo
    0

    Special Characters in Search Term

    I am getting Error on ezsearch.cshtml if the search string contains a - or :

    like;

    solution: explorer

    or

    terms - service

    which are clearly the search filter special characters (-,+,:).

    So my question is how to I handle these? Is there a config file to disable the special search characters?

  • Sunshine Lewis 31 posts 140 karma points c-trib
    Mar 19, 2015 @ 22:46
    Sunshine Lewis
    1

    Hi keilo!

    I was able to fix this by replacing those special characters with spaces (or nothing) before searching. I did this by adding a new helper function in ezSearch.cshtml (code sample edited to consolidate hyphen removal):

    // Escape the search term
    public string EscapeSearchTerm(string input)
    {
        //colon (:)
        input = Regex.Replace(input, @":", " ");
    
        // any hyphen that is between non digits or at the beginning of the term
        input = Regex.Replace(input, @"(\D*)-(\D)", "$1 $2");
    
        return input;
    }
    

    Then I call that function before the searchTerm get's tokenized (approximately line 64 of ezSearch.cshtml)

        model.SearchTerms = Tokenize(EscapeSearchTerm(model.SearchTerm)).Where(x => !Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET.Contains(x));
    

    This probably isn't the best way to handle it but it does stop the error from happening while returning mostly reasonable results (which is most important for me). You can also customize how you want those characters handled. Since I'm using the Standard analyzer, getting rid of them works best for me but your mileage may vary. There is probably a more efficient way to do this which is why I haven't submitted a pull request.

  • keilo 568 posts 1023 karma points
    Apr 08, 2015 @ 12:35
    keilo
    0

    Hey Lewis

    Many thanks for sharing your changes! This is very clean and makes me wonder why its not there by default.

    I also noticed addition of !Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET.Contains(x) which wasnt in the original ezsearch.cshtml that also solves the issue with and/or/by stop words. Without this if you search for text that contains the lucene stop words it wasnt returning aynthing.

    Just curious, in your live implementation did you use only the 2 ( colon and hypen) escape of search term. As in is the two adequate or would be ideal to add other(s)?

    many thanks!

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Apr 08, 2015 @ 14:32
    Ismail Mayat
    1

    When doing examine / lucene queries i have extension method to remove characters that would break the query

            /// <summary>
            /// take from http://stackoverflow.com/questions/263081/how-to-make-the-lucene-queryparser-more-forgiving
            /// </summary>
            /// <param name="query"></param>
            /// <returns></returns>
            public static string MakeSearchQuerySafe(this string query)
            {
                var regex = new Regex(@"[^\w\s-]");
                return regex.Replace(query, "");
            }

    Regards

    Ismail

  • Sebastiaan Janssen 5060 posts 15522 karma points MVP admin hq
    Sep 06, 2016 @ 16:17
    Sebastiaan Janssen
    1

    Thanks Ismail, I just implemented this for Our as well, finally.. :-)

  • keilo 568 posts 1023 karma points
    Apr 08, 2015 @ 15:25
    keilo
    0

    Hey Ismail

    Thanks for sharing this. Can I check how do you call this function? Do you call it for all (default) or after some-kinda try-catch ?

    Also wondering what do you use for handling the stop words when someone search for legit sentence like "abc and bde by xyz"?

    Woult it be adequate to use such tokenizer in ezsearch.cshtml?

    model.SearchTerms=Tokenize(MakeSearchQuerySafe(model.SearchTerm)).Where(x =>!Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET.Contains(x));
  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Apr 08, 2015 @ 15:32
    Ismail Mayat
    1

    keilo,

    You can call it directly on your search term which is a string and its an extension method so 

    model.SearchTerm.MakeSearchQuerySafe()

    I have never done anything with stop words as i have always used the standard analyser, this during indexing removes stop words and during querying will also remove stop words. I believe easy search uses standard analyser so in your example 

    abc and bde by xyz

    during indexing you will end up with abc bde xyz in index. When searching you will have stop words removed as well.

    Regards

    Ismail

  • Mykhailo 4 posts 75 karma points
    Mar 17, 2017 @ 17:50
    Mykhailo
    0

    It possible and preferable to use QueryParser class of Lucene. Example:

        var s="words: apple,..."; 
        s = QueryParser.Escape(s)
    

    Static method Escape add backslash before special characters.

  • Bobi 352 posts 956 karma points
    Apr 26, 2017 @ 16:17
    Bobi
    0

    Hi,

    I'm running into this issue too with special characters like colons :, brackets {}, etc.

    How can I go about fixing this in ezsearch.cshtml?

    I'm assuming I just have to make changes to the public string CleanseSearchTerm, but I'm not sure what to do.

    I have this so far:

    // ==================================================
        //  Helper Functions
        //==================================================
    
        // Cleanse the search term
        public string CleanseSearchTerm(string input)
        {
            return Umbraco.StripHtml(input).ToString();
        }
    
        // Splits a string on space, except where enclosed in quotes
        public IEnumerable<string> Tokenize(string input)
        {
            return Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
                .Cast<Match>()
                .Select(m => m.Value.Trim('\"'))
                .ToList();
        } 
    
Please Sign in or register to post replies

Write your reply to:

Draft