Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Hendy Racher 861 posts 3832 karma points MVP 2x admin c-trib
    May 26, 2011 @ 11:54
    Hendy Racher
    0

    Lucene fails when seaching on common words

    Hi, does anyone know of any work-arrounds to prevent Lucene from crashing when searching on any of the common words that are ignored by Lucene ?

    For example, try any of those words (an, and, are...) in the seach box above and the following error will occur:

     

    [NullReferenceException: Object reference not set to an instance of an object.]
       Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader) +312
       Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader) +319
       Lucene.Net.Search.IndexSearcher.Rewrite(Query original) +24
       Lucene.Net.Search.Query.Weight(Searcher searcher) +24
       Lucene.Net.Search.Searcher.CreateWeight(Query query) +11
       Lucene.Net.Search.Searcher.Search(Query query, Filter filter, Int32 n, Sort sort) +15
       Examine.LuceneEngine.SearchResults.DoSearch(Query query, IEnumerable`1 sortField) +191
       Examine.LuceneEngine.SearchResults..ctor(Query query, IEnumerable`1 sortField, IndexSearcher searcher) +82
       Examine.LuceneEngine.Providers.LuceneSearcher.Search(ISearchCriteria searchParams) +104

    TIA,

    Hendy

     

  • Hendy Racher 861 posts 3832 karma points MVP 2x admin c-trib
    May 26, 2011 @ 12:08
    Hendy Racher
    0

    am considering stripping out known words from the search before executing the query, but a more robust solution would be better :)

  • Shannon Deminick 1498 posts 5073 karma points hq
    May 26, 2011 @ 12:10
    Shannon Deminick
    0

    Its more just the fact that Lucene doesn't like 2 letter anything.

    also, the word 'and' may indicate the Lucene a boolean operation. generally this term is removed from the search depending on which analyzer you are using. Which analyzer are you using to index/search ?

    you should not allow people to search on words that are less than 3 chars... it will fail. or if you are searching a phrase you can surround in quotes. but again, the analyzer should strip out these words depending on which one you are using.

  • elspiko 133 posts 302 karma points
    May 26, 2011 @ 12:19
    elspiko
    1

    Its interesting. Both points appear to be valid here. if the text contains a word consisting of 2 alphabetic characters; such as "ag" it fails. As does putting the word "with" into a query. Here is an extension method I've just written which seems to do the job:

    public static IEnumerable<string> RemoveInvalidSearchTerms(this IEnumerable<string> terms)
            {
                var invalidSearchTerms = new[]
                                                  {
                                                      "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", 
                                                      "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", 
                                                      "the", "their", "then", "there", "these", "they", "this", "to", "was", 
                                                      "will", "with"
                                                  };
    
                return terms.Where(term => term.Length >= 2 && !Regex.Match(term, @"^\D{2}$").Success && !invalidSearchTerms.Contains(term)).ToList();
            }
  • Hendy Racher 861 posts 3832 karma points MVP 2x admin c-trib
    May 26, 2011 @ 12:21
    Hendy Racher
    0

    Hi Shannon,

    Thanks for the suggestion, would be a quick fix to strip out all words < 3 chars, but there are other words that cause Lucene to fall over like :

    into, such, that, their, then, there, these...

    We are using the Lucene.Net.Analysis.Standard.StandardAnalyzer

    Thanks,
    Hendy

  • Shannon Deminick 1498 posts 5073 karma points hq
    May 26, 2011 @ 12:25
    Shannon Deminick
    0

    very strange... TBH I've not seen this issue before but i can replicate it.  Can you log a bug for this at examine.codeplex.com ? perhaps there's a newer Lucene version that has this fixed!

  • Shannon Deminick 1498 posts 5073 karma points hq
    May 26, 2011 @ 12:26
    Shannon Deminick
    0

    Doh!, just realized the stack trace ends with:

    Examine.LuceneEngine.SearchCriteria.LuceneSearchCriteria.GetFieldInternalQuery(String fieldName, IExamineValue fieldValue, Boolean useQueryParser) +1583

    so might just be something weird going on in the examine codebase. If you could log a bug, that'd be fantastic.

  • Hendy Racher 861 posts 3832 karma points MVP 2x admin c-trib
    May 26, 2011 @ 12:30
    Hendy Racher
    0

    @Richard - thanks for that extension method :) looks like that's the quick fix to ensure search doesn't bomb.

    @Shannon - sure, I'll log a bug now.

  • Shannon Deminick 1498 posts 5073 karma points hq
    May 27, 2011 @ 04:23
    Shannon Deminick
    0

    I've fixed this with the latest changset:

    http://examine.codeplex.com/workitem/10326

    Hopefully will get a new version out in the coming weeks.

  • Shannon Deminick 1498 posts 5073 karma points hq
    May 30, 2011 @ 07:01
    Shannon Deminick
    0

    the beta is released:

    http://examine.codeplex.com/releases/view/67118

    please test if you can.

  • daveb.84 21 posts 46 karma points
    Nov 04, 2011 @ 15:26
    daveb.84
    2

    This problem is still occuring if I try and Boost search terms.  In my example, I'm trying to boost results where the search term matches the node name.  I still get the exception:

    string[] searchTerms = "the search term".Split(' ');
    
    var provider = ExamineManager.Instance.SearchProviderCollection["WebsiteSearcher"];
    
    ISearchCriteria searchCriteria = provider.CreateSearchCriteria(BooleanOperation.Or);
    
    foreach (var term in searchTerms)
    {
        // boost the results that match the title of the page.
        searchCriteria.Field("nodeName", term.Boost(_matchNodeNameBoostValue)).Or();
    }

    What I ended up doing is tapping into the Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET and perform a check on each for my search words to see if they are stop words and filtering them out if they are.  This seems to fix the problem:

    string[] searchTerms = "the search term".Split(' ');
    
    searchTerms = searchTerms.Where(x => !StopAnalyzer.ENGLISH_STOP_WORDS_SET.Contains(x.ToLower())).ToArray();
    

     Hope this helps any one in a similar predicament

  • Pushpendra Singh 61 posts 116 karma points
    Jul 09, 2014 @ 12:22
    Pushpendra Singh
    0

    Hi All,

    In my meta keyword contain alphanumeric value (eg. test1).

    Meta keyword field is Boost.When i am searching the term "test1" search result return nothing.any one knows the fix of this in examine search so, that user can search alphanumeric word as well.

    Regards,

    Pushpendra singh

  • Ismail Mayat 4437 posts 9758 karma points MVP 2x admin c-trib
    Jul 09, 2014 @ 12:59
    Ismail Mayat
    0

    Pushpendra,

    Which analyser are you using when indexing and searching (determine by looking at your examine config files look at which index you are using)? Also can you look in the index and see if that term is there if using umbraco 6 you can use examine inspector package or if using v7 you can use the search tools provided in umbraco backoffice.

    Regards

    Ismail

  • Pushpendra Singh 61 posts 116 karma points
    Jul 09, 2014 @ 13:04
    Pushpendra Singh
    0
  • Pushpendra Singh 61 posts 116 karma points
    Jul 09, 2014 @ 13:08
    Pushpendra Singh
    0
  • Pushpendra Singh 61 posts 116 karma points
    Jul 09, 2014 @ 14:04
    Pushpendra Singh
    0

    Ismail,

    I am using two analyzer WhitespaceAnalyzer as well as StandardAnalyzer in my Exmine setting config.

    My field is present in exmineindex.config.Problem only for alphanumeric not in alphabets.

    My umbraco version is 4.11.8.

    Regards,

    Pushpendra singh

  • Ismail Mayat 4437 posts 9758 karma points MVP 2x admin c-trib
    Jul 09, 2014 @ 15:13
    Ismail Mayat
    0

    Which index are you searching on External?  Also when you say you are using 2 analyzers is one for searching and the other indexing? They need to be the same for each index for indexing and searching else you will get un expected search results. Ps please do not cross post you have added a reply in another post.

  • Pushpendra Singh 61 posts 116 karma points
    Jul 09, 2014 @ 15:26
    Pushpendra Singh
    0

    Ismail,

    Sorry for cross post could you please delete i am unble to delete as we know issue in deleting post from forum.

    I am searching in custom index set (MySiteSearchIndexSet) which is for my site not internal or external :

  • Pushpendra Singh 61 posts 116 karma points
    Jul 09, 2014 @ 15:54
    Pushpendra Singh
    0

    Ismail,

    I am trying to put default provider in "ExamineIndexProviders" but it's trowing error.and if i am removing default provider from "ExamineSearchProviders" it will also through the error.I am trying if any success let you know

    Regards,

    Pushpendra

  • Pushpendra Singh 61 posts 116 karma points
    Jul 09, 2014 @ 16:15
    Pushpendra Singh
    0
    Ismail,
    find below my exmine setting config file :
    <Examine>
      <ExamineIndexProviders >
        <providers>
          <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               interval="10"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
          <add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               interval="10"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
          <add name="MySiteSearchIndexer" type="UmbracoExamine.LuceneExamineIndexer, UmbracoExamine" supportUnpublished="false" supportProtected="true" interval="10" indexSet="MySiteSearchIndexSet" />
          <add name="MyImageSearchIndexer" type="UmbracoExamine.LuceneExamineIndexer, UmbracoExamine" supportUnpublished="false" supportProtected="true" interval="10" indexSet="MyImageSearchIndexSet" />
        </providers>
      </ExamineIndexProviders>
      <ExamineSearchProviders defaultProvider="InternalSearcher">
        <providers>
          <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
          <add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcards="true"/>
          <add name="MySiteSearch" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" indexSet="MySiteSearchIndexSet" />
          <add name="MyImageSearch" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" indexSet="MyImageSearchIndexSet" />
        </providers>
      </ExamineSearchProviders>
    </Examine>
Please Sign in or register to post replies

Write your reply to:

Draft