Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Victor 10 posts 61 karma points
    Aug 20, 2013 @ 11:33
    Victor
    0

    Cannot find numeric values when performing a search using ArabicAnalyzer Indexer

    Hi,

    In an Umbraco 6.1.2 project we are performing a search for indexed values using Examine and Lucene queries.
    our values have been indexed using Lucene.Net.Analysis.AR.ArabicAnalyzer.

    when we query for numeric values we receive no results, while when we query for string values
    we have no problem.

    these are our set configurations:

    in ExamineSettings.config we have defined:

        <add name="ArabicIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
             supportUnpublished="false"
             supportProtected="false"
             interval="15"
             analyzer="Lucene.Net.Analysis.AR.ArabicAnalyzer, Lucene.Net.Contrib.Analyzers"
             indexSet="ArabicIndexSet" />

        <add name="ArabicSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
             analyzer="Lucene.Net.Analysis.AR.ArabicAnalyzer, Lucene.Net.Contrib.Analyzers" enableLeadingWildcards="true"/>

    in ExamieIndex.Config we have defined:

        <IndexSet SetName="ArabicIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Arabic/" >
            <IndexAttributeFields>
              <add Name="id" />
              <add Name="sortOrder"/>
              <add Name="nodeName"  EnableSorting="true"/>
              <add Name="updateDate" />
              <add Name="writerName" />
              <add Name="path" />
              <add Name="nodeTypeAlias" />
              <add Name="parentID" />
            </IndexAttributeFields>
            <IndexUserFields>
              <add Name="_AllContents"/>
              <add Name="nameEng"/>
              <add Name="cityName"/>
              <add Name="countryName"/>
               ...
             </IndexUserFields>
            <IncludeNodeTypes>
              <add Name="City" />
              <add Name="Country" />
              <add Name="Article" />
            </IncludeNodeTypes>
          </IndexSet>


    We have an index value for a country with an id of 1365, (UAE)
    using Luke.Net we can perform checks for our queries,

    when querying for the value with id=1365 - we get no results,

    when querying for the same value with nodeName=UAE - we get results,
    and can see that the id=1365 exists!

    if we perform the same queries with Lucene.Net.Analysis.Standard.StandardAnalyzer
    on the same indexed values, we have no problems and get all the results.

    our search led us to this
    https://issues.apache.org/jira/browse/LUCENE-2049

    where it says that:

    "arabic stem filter does not remove numbers. instead, the tokenizer is based on LetterTokenizer, which does not index numbers"

    as we saw, the id was indexed, but our problem is searching for it.

    how can we search for a numeric value that was indexed using Lucene.Net.Analysis.AR.ArabicAnalyzer?


Please Sign in or register to post replies

Write your reply to:

Draft