/// A {@link TokenFilter} with a stop word table.
36 /// <ul>
37 /// <li>Numeric tokens are removed.</li>
38 /// <li>English tokens must be larger than 1 char.</li>
39 /// <li>One Chinese char as one Chinese word.</li>
40 /// </ul>
So when you are using this analyser any decimal numbers are filtered out of the tokens
the bit of code that does the filtering:
switch (char.GetUnicodeCategory(text[0]))
81 {
82 case UnicodeCategory.LowercaseLetter:
83 case UnicodeCategory.UppercaseLetter:
84 // English word/token should larger than 1 char.
85 if (termLength > 1)
86 {
87 return true;
88 }
89 break;
90 case UnicodeCategory.OtherLetter:
91 // One Chinese char as one Chinese word.
92 // Chinese word extraction to be added later here.
93 return true;
94
}
So you can hopefully see there is no case for a UnicodeCategory of DecimalDigitNumber
and you are trying to filter by a decimal digital number in both the case of searchablePath and also umbracoNaviHide, even if they are passed by 'string' each character is parsed in turn - resulting in these tokens not being present in your generated query because their values are stripped by this filter.
So in theory you can get around this by compiling your own version of ChineseFilter.cs with a case to handle numbers and allow them as tokens...
switch (Char.GetUnicodeCategory(text[0]))
{
case UnicodeCategory.LowercaseLetter:
case UnicodeCategory.UppercaseLetter:
// English word/token should larger than 1 character.
if (text.Length > 1)
{
return token;
}
break;
case UnicodeCategory.DecimalDigitNumber:
return token;
break;
case UnicodeCategory.OtherLetter:
// One Chinese character as one Chinese word.
// Chinese word extraction to be added later here.
return token;
}
and this will allow you to pass 1 to umbracoNaviHide and 1068 to SearchablePath - but to be honest, I've no idea if this is a good thing to do or not :-) - hopefully though it explains the puzzle of what is going on !!!
Examine query generation issue
Guys,
I have the following code:
When my site instance is not chinese so its going down and getting a MultiSearcher object I get generated query that looks like:
That is correct. However when I am in the chinese site I get the query:
It seems to be ignoring the parts of the query that I am passing in using fluent api.
The multisearcher and searcher from collection are types of BaseSearchProvider. Anyone seen this before?
Regards
Ismail
Hi Ismail
If you are using the Lucene.net Chinese Analyzer from here:
https://lucenenet.apache.org/docs/3.0.3/dir_354f6a4a03ec35feea9a4444b3b86ec9.html
You will see this in the comments of the ChineseFilter:
https://lucenenet.apache.org/docs/3.0.3/d6/dab/chinesefilter8cssource.html
So when you are using this analyser any decimal numbers are filtered out of the tokens
the bit of code that does the filtering:
So you can hopefully see there is no case for a UnicodeCategory of DecimalDigitNumber
and you are trying to filter by a decimal digital number in both the case of searchablePath and also umbracoNaviHide, even if they are passed by 'string' each character is parsed in turn - resulting in these tokens not being present in your generated query because their values are stripped by this filter.
So in theory you can get around this by compiling your own version of ChineseFilter.cs with a case to handle numbers and allow them as tokens...
switch (Char.GetUnicodeCategory(text[0])) {
and this will allow you to pass 1 to umbracoNaviHide and 1068 to SearchablePath - but to be honest, I've no idea if this is a good thing to do or not :-) - hopefully though it explains the puzzle of what is going on !!!
regards
Marc
is working on a reply...