Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Topic author was deleted

    Apr 16, 2014 @ 16:24

    Setup Lucence/Examine for hyphenated words

    So I'm looking for the best setup on Examine/Lucence to do hyphanted words.

    Using the StandardAnalyzer (for both indexing/querying), words like M1-1234 are not searched properly when keywords like: M1, M1-, M1-* are input.

    Any experience with which Analyzer combos to make this work is appreciated.

    Cheers,

    Kevin

  • Comment author was deleted

    Apr 16, 2014 @ 16:35

    Looks like Whitespace/Whitespace is the trick, but it appears to be case-sensitive.  Have to work through that next.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Apr 16, 2014 @ 16:39
    Ismail Mayat
    0

    Kevin,

    Standard will definitely remove - and any other non alphanumeric so m1-1234 will end up in the index as m1 1234.

    I am also certain that when you search the searcher under the hood if using standard analyser will take term m1-1234 and give m1 1234 now depending on how you code your search you may just be doing m1 on its own.

    What I usually have and eZsearch has something similar is take the term then test does it have a space or in your case - then split it out then do query as groupedor so it becomes in lucene something like +(contents:m1 contents:1234) see below:

                    //multiple values in one field
                if (qsValue.Contains(" "))
                {
                    string[] terms = qsValue.Split(' ');
                    queryToBuild.And().GroupedOr(new List<string> { key }, terms);
    
                }
    

    You could change to groupedAnd. I also have the following that i run through my search term before doing query:

            /// <summary>
        /// take from http://stackoverflow.com/questions/263081/how-to-make-the-lucene-queryparser-more-forgiving
        /// </summary>
        /// <param name="query"></param>
        /// <returns></returns>
        public static string MakeSearchQuerySafe(this string query)
        {
            var regex = new Regex(@"[^\w\s-]");
            return regex.Replace(query, "");
        }
    

    You may want to update and replace with space in the regex? In some old code i also have

                        if (qsValue.Contains("-"))
                    {
                        queryToBuild = queryToBuild.And().Field(key, qsValue.Escape());
                    }
    

    However I have no idea why i did it oh Anthony i am sorry for not commenting in enough!!! (Anthony Dang during code reviews used to always tell me off for not commmenting enough doh!!)

    Have a play with above see if it gets you any further. However I reckon using makesearchquerysafe with space test and creating groupedor or and should do the trick.

    Regards

    Ismail

  • Pushpendra Singh 61 posts 116 karma points
    Jul 09, 2014 @ 15:04
    Pushpendra Singh
    0

    Ismail,

    I am using two analyzer WhitespaceAnalyzer as well as StandardAnalyzer in my Exmine setting config.

    My field is present in exmineindex.config.Problem only for alphanumeric search (eg. "test1") in boost field (metakeyword) not in alphabets.

    My umbraco version is 4.11.8.

    Regards,

    Pushpendra singh

  • Michaela Ivanova 12 posts 104 karma points
    Nov 18, 2016 @ 09:07
    Michaela Ivanova
    0

    Actually it splits the word by '-'. The problem is not in the Index or the Settings. Try using UmbracoHelper class and the method TypedSearch(keywords, false, "YourSearcher"), note that useWildCards is set to be "false". For more info see the method Search here: https://github.com/umbraco/Umbraco-CMS/blob/dev-v7/src/Umbraco.Web/PublishedContentExtensions.cs

Please Sign in or register to post replies

Write your reply to:

Draft