examine query generation issue

Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Sep 30, 2016 @ 09:05

Examine query generation issue

Guys,

I have the following code:

 BaseSearchProvider searcher;  

string fieldToSearch = "contents";

string HideFromNavigation = "umbracoNaviHide";

var umbraco = new UmbracoHelper(UmbracoContext.Current);

IPublishedContent currentNode = umbraco.TypedContent(Model.Id);

//if we on chinese use use chinese searcher
if (currentNode.GetCulture().TwoLetterISOLanguageName.ToLower().Contains("zh"))
{
    searcher = ExamineManager.Instance.SearchProviderCollection["External_CN_Searcher"];
}
else
{
    searcher = ExamineHelper.GetMultiSearcher(new[] { "ExternalIndexer", "PDFIndexer" });
}

var criteria = searcher.CreateSearchCriteria(BooleanOperation.And);

var searchTerm = string.Empty;

searchTerm = string.IsNullOrEmpty(Request["query"]) ? string.Empty : Request["query"];

searchTerm = searchTerm.MakeSearchQuerySafe();

if (searchTerm == string.Empty)
{
    <p>Enter search term</p>
}
else
{

    int siteRootId = 0;

    if (Current.Parent == null)
    {
        siteRootId = Current.Id;
    }
    else
    {
        siteRootId = Current.Parent.Id;
    }
    var examineQuery = criteria.Field("SearchablePath", siteRootId.ToString());

    examineQuery.Not().Field(HideFromNavigation, 1.ToString());


    string[] terms = searchTerm.Split(' ');

    examineQuery.And().GroupedOr(new List<string> { fieldToSearch, "FileTextContent" }, terms);

    examineQuery.And().OrderByDescending("reviewDate");

    var results = searcher.Search(examineQuery.Compile());
    <p>@criteria.ToString()</p>
    if (results.Any())
    {
        <p>You search for ""<strong>@searchTerm</strong>" found @results.Count() results</p>
        @RenderResults(results, umbraco)
    }
    else
    {
        <p>No results found for query @searchTerm</p>
    }
}

When my site instance is not chinese so its going down and getting a MultiSearcher object I get generated query that looks like:

+SearchablePath:1068 -umbracoNaviHide:1 +(contents:umbraco FileTextContent:umbraco)

That is correct. However when I am in the chinese site I get the query:

+(contents:test FileTextContent:test)

It seems to be ignoring the parts of the query that I am passing in using fluent api.

The multisearcher and searcher from collection are types of BaseSearchProvider. Anyone seen this before?

Regards

Ismail

Copy Link

Marc Goodson 2157 posts 14434 karma points MVP 9x c-trib

Oct 03, 2016 @ 21:19

Hi Ismail

If you are using the Lucene.net Chinese Analyzer from here:

https://lucenenet.apache.org/docs/3.0.3/dir_354f6a4a03ec35feea9a4444b3b86ec9.html

You will see this in the comments of the ChineseFilter:

https://lucenenet.apache.org/docs/3.0.3/d6/dab/chinesefilter8cssource.html

  /// A {@link TokenFilter} with a stop word table.  
   36     /// <ul>
   37     /// <li>Numeric tokens are removed.</li>
   38     /// <li>English tokens must be larger than 1 char.</li>
   39     /// <li>One Chinese char as one Chinese word.</li>
   40     /// </ul>

So when you are using this analyser any decimal numbers are filtered out of the tokens

the bit of code that does the filtering:

 switch (char.GetUnicodeCategory(text[0]))
   81                     {
   82                         case UnicodeCategory.LowercaseLetter:
   83                         case UnicodeCategory.UppercaseLetter:
   84                             // English word/token should larger than 1 char.
   85                             if (termLength > 1)
   86                             {
   87                                 return true;
   88                             }
   89                             break;
   90                         case UnicodeCategory.OtherLetter:
   91                             // One Chinese char as one Chinese word.
   92                             // Chinese word extraction to be added later here.
   93                             return true;
   94      

           }

So you can hopefully see there is no case for a UnicodeCategory of DecimalDigitNumber

and you are trying to filter by a decimal digital number in both the case of searchablePath and also umbracoNaviHide, even if they are passed by 'string' each character is parsed in turn - resulting in these tokens not being present in your generated query because their values are stripped by this filter.

So in theory you can get around this by compiling your own version of ChineseFilter.cs with a case to handle numbers and allow them as tokens...

switch (Char.GetUnicodeCategory(text[0])) {

                    case UnicodeCategory.LowercaseLetter:
                    case UnicodeCategory.UppercaseLetter:

                        // English word/token should larger than 1 character.
                        if (text.Length > 1)
                        {
                            return token;
                        }
                        break;
                    case UnicodeCategory.DecimalDigitNumber:                        
                            return token;                         
                        break;
                    case UnicodeCategory.OtherLetter:

                        // One Chinese character as one Chinese word.
                        // Chinese word extraction to be added later here.

                        return token;
                }

and this will allow you to pass 1 to umbracoNaviHide and 1068 to SearchablePath - but to be honest, I've no idea if this is a good thing to do or not :-) - hopefully though it explains the puzzle of what is going on !!!

regards

Marc

Copy Link

is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Flag this post as spam?

Examine query generation issue