Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • TikTakToe 60 posts 102 karma points
    Apr 24, 2015 @ 22:03
    TikTakToe
    0

    Examine & PDFs, not indexing correctly

    Getting some weird stuff happening - i have about 20 or so pdfs, as far as i can tell it has searchable text (i can search within reader) . Configs set below, no errors, can run the index in the dashboard - does something for a short while then seems to come back with zero or 4 results randomly. luke screengrab below too.

    Any pointers to where the problem is

    umbraco 7.1.8 (using hybrid framework)

    other indexes on content working fine.

    Thanks

    <Examine>
      <ExamineIndexProviders>
        <providers>
          <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
          <add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
    
            <!-- default external indexer, which excludes protected and unpublished pages-->
            <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"/>
    
          <add name="ArticleIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
         analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
         indexSet="ArticleIndexSet"/>
    
          <add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF" extensions=".pdf" umbracoFileProperty="umbracoFile" />
    
        </providers>
      </ExamineIndexProviders>
    
      <ExamineSearchProviders defaultProvider="ExternalSearcher">
        <providers>
          <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
          <add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
    
          <add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true"/>
    
          <add name="ArticleSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
         analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" indexSet="ArticleIndexSet"/>
    
          <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
    
    
        </providers>
      </ExamineSearchProviders>
    
    </Examine>
    

    Dashboard

    luke

  • TikTakToe 60 posts 102 karma points
    Apr 25, 2015 @ 12:50
    TikTakToe
    0

    well, i've now tried the CogUmbracoExamineMediaIndexer - i'm getting results but when i look at the index in luke, the fileTextContent is empty. I've checked the pdfs and can select the text (i.e. they're not just graphic pdfs).

    am i to assume that my pdfs are not searchable for some reason?

    new index via luke

  • TikTakToe 60 posts 102 karma points
    Apr 25, 2015 @ 13:24
    TikTakToe
    0

    ok, had a look into further - cant get the original pdf indexer to work, the CogUmbracoExamineMediaIndexer works but doesn't have any value in fileTextContent, so took a stab and downloaded lastest version of itextsharp core dll, popped that in the bin folder, re-ran the indexes and bingo! i now have indexed and searchable pdfs...

Please Sign in or register to post replies

Write your reply to:

Draft