My first guess would be maybe the german stemmer, or some other part of the german analyser pipeline is changing the word? Do you get the same result if you put "content" in the contents field? E.g.
This stack overflow post might have some clues, it looks like in the Java version at least, you can specify words that don't get stemmed. Not sure if that's the case in the .Net port though.
Its definately the stemmer, I have opened luke and in the plugins section done some analysis and for word content using german analyser you get stemmed form content.
A workaround for this to modify the query text, after query is being compiled (so tokenized and stemmed by GermanAnalyzer).
var rawQuery = query.Compile().ToString();
var queryMatch = Regex.Match(rawQuery,
@"LuceneQuery: (?<query>.*?)\s*}$");
var luceneQueryText = queryMatch.Groups["query"].Value;
var fixedRawQuery = Regex.Replace(
luceneQueryText,
"__IndexType:con$",
"__IndexType:content*");
var fixedCriteria = contentSearcher.CreateSearchCriteria();
fixedCriteria.RawQuery(fixedRawQuery);
var contentSearchResults = contentSearcher.Search(fixedCriteria);
German analyser
Guys,
Working on search for german using lucene.net contrib and the german analyser. When i do a query the generated query looks like:
You notice it has cut of the last bit it should read __IndexType:content
When using standard analyser query is generated fine. Anyone else seen this before?
Regards
Ismail
My first guess would be maybe the german stemmer, or some other part of the german analyser pipeline is changing the word? Do you get the same result if you put "content" in the contents field? E.g.
Tim,
No if you type word content you get
Something else is borking it GRRRR
Tim,
Actually you are right. So we took of the wildcard then did query again and now we get
Rats!!
Ismail
Tim,
Very interesting read http://www.evelix.ch/unternehmen/Blog/evelix/2013/11/11/inner-workings-of-the-german-analyzer-in-lucene could be the stemmer mashing it up. Says if word is wildcarded then its not stemmed.
So maybe this is the issue.
Regards
Ismail
This stack overflow post might have some clues, it looks like in the Java version at least, you can specify words that don't get stemmed. Not sure if that's the case in the .Net port though.
Tim,
That looks like list of stop words not words to ignore for stemming?
Regards
Ismail
Tim,
Its definately the stemmer, I have opened luke and in the plugins section done some analysis and for word content using german analyser you get stemmed form content.
Regards
Ismail
Tim,
My colleague Dawid has created an issue on github https://github.com/Shazwazza/Examine/issues/54
Regards
Ismail
I've created a pull request for this issue in Examine repo:
A workaround for this to modify the query text, after query is being compiled (so tokenized and stemmed by GermanAnalyzer).
is working on a reply...