Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Daniel Gustafsson 8 posts 88 karma points
    Aug 16, 2019 @ 09:39
    Daniel Gustafsson
    0

    Examine with swedish characters

    Hi,

    Im trying to build a search function for a site that is in swedish. I am able to search, but when i search with swedish characters ( Å Ä Ö ) it does not work.

    For example if i search for Göteborg i get 0 hits, but if i instead use the term Goteborg it works.

    Anybody got a solution? Do i need to configure the index for multilanguage?

    Thanks in advance!

  • ErikC 1 post 71 karma points
    Sep 02, 2019 @ 15:08
    ErikC
    0

    Hello Daniel! I'm having the same issue for swedish characters? Were you able to solve this?

  • Ismail Mayat 4332 posts 9337 karma points MVP 2x admin c-trib
    Sep 02, 2019 @ 16:53
    Ismail Mayat
    0

    Are you doing a wildcard search? So during indexing it will run through standard analyser (thats if you have not changed it to another analyser) and it will ascii flatten characters so ( Å Ä Ö ) will go in as (a a o) also during searching it will do same thing so it should all work.

    If you are doing wildcard searching then if i remember rightly it wont ascii flatten the query so it searches literally on those characters but in examine / lucene it has the flattened characters.

    I recall covering this or having this in the notes on examine course so the code I have is:

    public class AsciiFoldingFilter
    {
        private readonly Analyzer _analyzer;
        // We are analyzing the query before adding the wildcards 
        // This way the words containg diactrics (characters specific to a language)
        // will be folded to ASCII character set.
        // e.g. word "weiß Glückwunsch" will be flattened into "weiss gluckwunsch"
        //
        // When the wildcards are added before analyzing, then  the text will not be analyzed
        // https://issues.apache.org/jira/browse/LUCENENET-486 
        // http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F
        public AsciiFoldingFilter(BaseSearchProvider baseSearchProvider)
        {
            var luceneSearch = (BaseLuceneSearcher)baseSearchProvider;
            _analyzer = luceneSearch.IndexingAnalyzer;
        }
        public AsciiFoldingFilter(Analyzer analyzer)
        {
            _analyzer = analyzer;
        }
        public string FlattenToAscii(string stringToFold)
        {
            var parser = new QueryParser(
                Lucene.Net.Util.Version.LUCENE_29,
                string.Empty,
                _analyzer);
    
    
            var query = parser.Parse(stringToFold.Trim());
            return query.ToString();
        }
    }
    

    On the query side before you wildcard it run the query through this AsciiFoldingFilter then wildcard and it should work.

    Regards

    Isamil

  • Daniel Gustafsson 8 posts 88 karma points
    Sep 05, 2019 @ 08:31
    Daniel Gustafsson
    0

    Hi,

    Thanks for the reply. After some investigation on my own i found out that i was indeed the wildcard search that did flatten the swedish characters. I did try it with a fuzzy search and it worked aswell.

    Thanks for the code. I will try that solution out.

    /Daniel

Please Sign in or register to post replies

Write your reply to:

Draft