It looks like Lucene does not index common words ("the", "of" "as",
etc), which makes sense. However, when I query against the index with
Examine (or more precisely when the client queries), if the search term
contains any common words, I get no results.
If I enter "hauling in a line" in Luke, I get results, because Luke parses the phrase as:
1. I don't know how to programmatically set up a query in Examine so that it returns ORs (instead of ANDs). How do I do that?
2.
I actually don't think I want to do that, as ANDs are more true to how I
want to query in this instance. Is there any way to handle common words
in the search phrase?
The WhitespaceAnalyzer is also a bit of a sub-set of the StandardAnalyzer, where it understands word breaks in English text, based on spaces and line breaks.
This Analyzer is great if you want to search on any English word, it doesn't ignore stop words so you can search on a or the if required. This was how I got around the problem I described above.
Which analyzer are you using?
When I work with Examine I often revert to using a raw lucene query e.g.
SearchQuery=string.Format("+{0}:{1}~",SearchField,SearchTerm.Text); var query = searchCriteria.RawQuery(SearchQuery);
I'm just slowly peeling back the layers of that onion that is Umbraco...
Turns out I was using the whitespace analyzer for the searcher, and the standard analyzer for the indexer. I had just pretty much copied and pasted the values, not understanding what they are / do. That link makes things a lot clearer. Now that I am using whitespace analyzer for both, common terms are being indexed and correctly handled during query
Common words in examine
It looks like Lucene does not index common words ("the", "of" "as", etc), which makes sense. However, when I query against the index with Examine (or more precisely when the client queries), if the search term contains any common words, I get no results.
If I enter "hauling in a line" in Luke, I get results, because Luke parses the phrase as:
description:hauling description:in description:a description:line
(ie a whole bunch of "OR"s)
But when I use GroupedOr in Examine, the same phrase gets parsed as:
+(description:hauling) +(description:in) +(description:a) +(description:line)
(ie a whole bunch of "AND"s)
1. I don't know how to programmatically set up a query in Examine so that it returns ORs (instead of ANDs). How do I do that?
2. I actually don't think I want to do that, as ANDs are more true to how I want to query in this instance. Is there any way to handle common words in the search phrase?
Thanks Umbracans!
Jonathan
Has anyone else had this problem?
Really need a way to filter out common words (or otherwise handle them) in examine queries, and working to a deadline
Many thanks
Jonathan
This makes some good reading to start with: http://www.aaron-powell.com/lucene-analyzer
Quote:
Whitespace Analyzer
The WhitespaceAnalyzer is also a bit of a sub-set of the StandardAnalyzer, where it understands word breaks in English text, based on spaces and line breaks.
This Analyzer is great if you want to search on any English word, it doesn't ignore stop words so you can search on a or the if required. This was how I got around the problem I described above.
Which analyzer are you using?
When I work with Examine I often revert to using a raw lucene query e.g.
Thanks for this Darren
I'm just slowly peeling back the layers of that onion that is Umbraco...
Turns out I was using the whitespace analyzer for the searcher, and the standard analyzer for the indexer. I had just pretty much copied and pasted the values, not understanding what they are / do. That link makes things a lot clearer. Now that I am using whitespace analyzer for both, common terms are being indexed and correctly handled during query
Thanks again
Jonathan
is working on a reply...