Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Pav 15 posts 45 karma points
    Nov 21, 2013 @ 11:23
    Pav
    0

    How to remove noise words from indexing?

    I am using CogUmbracoExamineMediaIndexer(https://bitbucket.org/thecogworks/cogumbracoexaminemediaindexer) for searching media. I would like to remove noise words and specific words from indexing.

     

    My definition is:

    <ExamineLuceneIndexSets>

      <IndexSet SetName="MediaIndexSet" IndexPath="~/App_Data/MediaIndexSet">

        <IndexAttributeFields>

          <add Name="id" />

          <add Name="nodeName" />

          <add Name="updateDate" />

          <add Name="writerName" />

          <add Name="path" />

          <add Name="nodeTypeAlias" />

          <add Name="parentID" />

        </IndexAttributeFields>

        <IncludeNodeTypes>

          <add Name="File" />

        </IncludeNodeTypes>

      </IndexSet>

    </ExamineLuceneIndexSets>

     

    and

    <ExamineIndexProviders>

    <add name="MediaIndexer" type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer"

               extensions=".pdf,.docx"

               umbracoFileProperty="umbracoFile"

               youTubeUrlProperty=""/>

        </providers>

      </ExamineIndexProviders>

    <ExamineSearchProviders>

      <add name="MediaSearcher"

               type="UmbracoExamine.LuceneExamineSearcher, UmbracoExamine"

              indexSet="MediaIndexSet" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>

    </ExamineSearchProviders>

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Nov 21, 2013 @ 11:27
    Ismail Mayat
    100

    Pav,

    Update

    <add name="MediaIndexer" type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer"
    
           extensions=".pdf,.docx"
    
           umbracoFileProperty="umbracoFile"
    
           youTubeUrlProperty=""/>
    

    to

    <add name="MediaIndexer" type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer"
    
           extensions=".pdf,.docx"
    
           umbracoFileProperty="umbracoFile"
    
           youTubeUrlProperty=""
    
           analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
    
           />
    

    It should work you will need to rebuild index.

    Regards

    Ismail

  • Pav 15 posts 45 karma points
    Nov 21, 2013 @ 12:11
    Pav
    0

    Can I add exclude noise words too?

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies