Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Pav 15 posts 45 karma points
    Nov 21, 2013 @ 11:23
    Pav
    0

    How to remove noise words from indexing?

    I am using CogUmbracoExamineMediaIndexer(https://bitbucket.org/thecogworks/cogumbracoexaminemediaindexer) for searching media. I would like to remove noise words and specific words from indexing.

     

    My definition is:

    <ExamineLuceneIndexSets>

      <IndexSet SetName="MediaIndexSet" IndexPath="~/App_Data/MediaIndexSet">

        <IndexAttributeFields>

          <add Name="id" />

          <add Name="nodeName" />

          <add Name="updateDate" />

          <add Name="writerName" />

          <add Name="path" />

          <add Name="nodeTypeAlias" />

          <add Name="parentID" />

        </IndexAttributeFields>

        <IncludeNodeTypes>

          <add Name="File" />

        </IncludeNodeTypes>

      </IndexSet>

    </ExamineLuceneIndexSets>

     

    and

    <ExamineIndexProviders>

    <add name="MediaIndexer" type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer"

               extensions=".pdf,.docx"

               umbracoFileProperty="umbracoFile"

               youTubeUrlProperty=""/>

        </providers>

      </ExamineIndexProviders>

    <ExamineSearchProviders>

      <add name="MediaSearcher"

               type="UmbracoExamine.LuceneExamineSearcher, UmbracoExamine"

              indexSet="MediaIndexSet" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>

    </ExamineSearchProviders>

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Nov 21, 2013 @ 11:27
    Ismail Mayat
    100

    Pav,

    Update

    <add name="MediaIndexer" type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer"
    
           extensions=".pdf,.docx"
    
           umbracoFileProperty="umbracoFile"
    
           youTubeUrlProperty=""/>
    

    to

    <add name="MediaIndexer" type="CogUmbracoExamineMediaIndexer.MediaIndexer, CogUmbracoExamineMediaIndexer"
    
           extensions=".pdf,.docx"
    
           umbracoFileProperty="umbracoFile"
    
           youTubeUrlProperty=""
    
           analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
    
           />
    

    It should work you will need to rebuild index.

    Regards

    Ismail

  • Pav 15 posts 45 karma points
    Nov 21, 2013 @ 12:11
    Pav
    0

    Can I add exclude noise words too?

Please Sign in or register to post replies

Write your reply to:

Draft