I've been working on extending an existing Umbraco 7.8.1 site. This site has a global search (global in the sense that the search form is on every page and searches every published document).
I've added a new section into the content tree and one of the requirements was a local search for the new section that only returned published content that was in the new section.
I just call the normal search and then filter out any unwanted pages which works very nicely indeed.
However I've got a problem with the results. I have one page in the new section with its Name set to "An afternoon of foo bar" and it also has a property with the exact same value which is used to populate the title on a Hero component. I can see both of these properties & their values in the search index.
If I search for "afternoon" I get one result as expected. If I search for "an afternoon" I get zero results. The word "afternoon" does not appear in any other properties on that page. Lucene can only be getting it from the two properties mentioned.
Why can't Lucene find "an afternoon"? I know from working with Lucene on previous non Umbraco projects that common words (like: it, and, the) are counted as noise and discounted as search terms. Shouldn't this have the effect of reducing the search term to "afternoon" and return me the same result as manually searching for "afternoon"?
Can you look through the source code and see which searcher is being used todo the search. Then look in examine settings config file for that searcher to see which analyser is being used. Can you also look at which analyser is being used for indexing.
If the analyser for both indexing and searching is standard. Then the word "an" which is english stop word will be removed both by the indexer and by the query parser so for your query you should get matches. If analyser is stop word then your query will be a phrase query and it will try and match exactly on your query and if the words are not adjacent it will not match and there you would need to do grouped or when doing your query.
Could you also write out the generated query so we can see whats going on. You can do that by looking at your search code and where you call .Search you will be passing in a criteria / query object run a .ToString call on that and that will give you the generated lucene query.
Strange results from a Lucene search
Hi all,
I've been working on extending an existing Umbraco 7.8.1 site. This site has a global search (global in the sense that the search form is on every page and searches every published document).
I've added a new section into the content tree and one of the requirements was a local search for the new section that only returned published content that was in the new section.
I just call the normal search and then filter out any unwanted pages which works very nicely indeed.
However I've got a problem with the results. I have one page in the new section with its Name set to "An afternoon of foo bar" and it also has a property with the exact same value which is used to populate the title on a Hero component. I can see both of these properties & their values in the search index.
If I search for "afternoon" I get one result as expected. If I search for "an afternoon" I get zero results. The word "afternoon" does not appear in any other properties on that page. Lucene can only be getting it from the two properties mentioned.
Why can't Lucene find "an afternoon"? I know from working with Lucene on previous non Umbraco projects that common words (like: it, and, the) are counted as noise and discounted as search terms. Shouldn't this have the effect of reducing the search term to "afternoon" and return me the same result as manually searching for "afternoon"?
I don't know the answer to this specific question but in the past the following link helped me a lot:
Examine search documentation
Thank you Frans. I'll check it out.
David,
Can you look through the source code and see which searcher is being used todo the search. Then look in examine settings config file for that searcher to see which analyser is being used. Can you also look at which analyser is being used for indexing.
If the analyser for both indexing and searching is standard. Then the word "an" which is english stop word will be removed both by the indexer and by the query parser so for your query you should get matches. If analyser is stop word then your query will be a phrase query and it will try and match exactly on your query and if the words are not adjacent it will not match and there you would need to do grouped or when doing your query.
Could you also write out the generated query so we can see whats going on. You can do that by looking at your search code and where you call .Search you will be passing in a criteria / query object run a .ToString call on that and that will give you the generated lucene query.
Regards
Ismial
is working on a reply...