I am not sure if this is the correct place to post this issue because it feels like it could be a more general examne issue but let's see:
I am using ezsearch and have come accross the following strange problem. Lets say I have a node with the name "This is not my cat" a search for this is not my cat returns no results
If i search just the term cat then my item does appear in the search results.
On further investigation it appears that there are a whole series of words that cause this problem. Words like "and", "the", "it" etc
If I go and change the name of my node to "thisx isx notx myx cat" then my search for this is not my cat does return the correct item. So it finds these common words if they are a fragment of a longer word.
My examine settings are all just the defaults.
If anybody can shed any light on this it would be much appreciated
ezSearch uses examine which under the hood uses lucene.net. In lucene.net you have the concept of analysers and different ones do different things. eZSearch uses the externalindexer which uses standard analyser, this analyser will remove english stop words at the point of indexing and at the point of searching see http://www.codeproject.com/Articles/32175/Lucene-Net-Text-Analysis for more information. In your example the text
"This is not my cat"
will end up in the index as "cat" when you search using same phrase it should just search on cat and work becuase the query analyser is also using standard analyser and it will strip stop words.
Not finding common terms
I am not sure if this is the correct place to post this issue because it feels like it could be a more general examne issue but let's see:
I am using ezsearch and have come accross the following strange problem. Lets say I have a node with the name "This is not my cat" a search for this is not my cat returns no results
If i search just the term cat then my item does appear in the search results.
On further investigation it appears that there are a whole series of words that cause this problem. Words like "and", "the", "it" etc
If I go and change the name of my node to "thisx isx notx myx cat" then my search for this is not my cat does return the correct item. So it finds these common words if they are a fragment of a longer word.
My examine settings are all just the defaults.
If anybody can shed any light on this it would be much appreciated
Richard,
ezSearch uses examine which under the hood uses lucene.net. In lucene.net you have the concept of analysers and different ones do different things. eZSearch uses the externalindexer which uses standard analyser, this analyser will remove english stop words at the point of indexing and at the point of searching see http://www.codeproject.com/Articles/32175/Lucene-Net-Text-Analysis for more information. In your example the text
"This is not my cat"
will end up in the index as "cat" when you search using same phrase it should just search on cat and work becuase the query analyser is also using standard analyser and it will strip stop words.
Regards
Ismail
is working on a reply...