Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Jonathan Lathigee 56 posts 99 karma points
    Apr 14, 2011 @ 18:56
    Jonathan Lathigee
    0

    Common words in examine

    It looks like Lucene does not index common words ("the", "of" "as", etc), which makes sense. However, when I query against the index with Examine (or more precisely when the client queries), if the search term contains any common words, I get no results.

    If I enter "hauling in a line" in Luke, I get results, because Luke parses the phrase as:

    description:hauling description:in description:a description:line

    (ie a whole bunch of "OR"s)

    But when I use GroupedOr in Examine, the same phrase gets parsed as:

    +(description:hauling) +(description:in) +(description:a) +(description:line)

    (ie a whole bunch of "AND"s)

    1. I don't know how to programmatically set up a query in Examine so that it returns ORs (instead of ANDs). How do I do that?

    2. I actually don't think I want to do that, as ANDs are more true to how I want to query in this instance. Is there any way to handle common words in the search phrase?

    Thanks Umbracans!

    Jonathan

  • Jonathan Lathigee 56 posts 99 karma points
    Apr 17, 2011 @ 18:07
    Jonathan Lathigee
    0

    Has anyone else had this problem?

    Really need a way to filter out common words (or otherwise handle them) in examine queries, and working to a deadline

    Many thanks

    Jonathan

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Apr 18, 2011 @ 10:21
    Darren Ferguson
    0

    This makes some good reading to start with: http://www.aaron-powell.com/lucene-analyzer

    Quote:

    Whitespace Analyzer

    The WhitespaceAnalyzer is also a bit of a sub-set of the StandardAnalyzer, where it understands word breaks in English text, based on spaces and line breaks.

    This Analyzer is great if you want to search on any English word, it doesn't ignore stop words so you can search on a or the if required. This was how I got around the problem I described above.

    Which analyzer are you using?

    When I work with Examine I often revert to using a raw lucene query e.g.

    SearchQuery = string.Format("+{0}:{1}~", SearchField, SearchTerm.Text);
               
    var query = searchCriteria.RawQuery(SearchQuery);


  • Jonathan Lathigee 56 posts 99 karma points
    Apr 19, 2011 @ 00:02
    Jonathan Lathigee
    0

    Thanks for this Darren

    I'm just slowly peeling back the layers of that onion that is Umbraco...

    Turns out I was using the whitespace analyzer for the searcher, and the standard analyzer for the indexer. I had just pretty much copied and pasted the values, not understanding what they are / do. That link makes things a lot clearer. Now that I am using whitespace analyzer for both, common terms are being indexed and correctly handled during query

    Thanks again

    Jonathan

Please Sign in or register to post replies

Write your reply to:

Draft