Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • TikTakToe 60 posts 102 karma points
    Apr 24, 2015 @ 22:03
    TikTakToe
    0

    Examine & PDFs, not indexing correctly

    Getting some weird stuff happening - i have about 20 or so pdfs, as far as i can tell it has searchable text (i can search within reader) . Configs set below, no errors, can run the index in the dashboard - does something for a short while then seems to come back with zero or 4 results randomly. luke screengrab below too.

    Any pointers to where the problem is

    umbraco 7.1.8 (using hybrid framework)

    other indexes on content working fine.

    Thanks

    <Examine>
      <ExamineIndexProviders>
        <providers>
          <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
          <add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
    
            <!-- default external indexer, which excludes protected and unpublished pages-->
            <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"/>
    
          <add name="ArticleIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
         analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
         indexSet="ArticleIndexSet"/>
    
          <add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF" extensions=".pdf" umbracoFileProperty="umbracoFile" />
    
        </providers>
      </ExamineIndexProviders>
    
      <ExamineSearchProviders defaultProvider="ExternalSearcher">
        <providers>
          <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
          <add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
    
          <add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcard="true"/>
    
          <add name="ArticleSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
         analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" indexSet="ArticleIndexSet"/>
    
          <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
    
    
        </providers>
      </ExamineSearchProviders>
    
    </Examine>
    

    Dashboard

    luke

  • TikTakToe 60 posts 102 karma points
    Apr 25, 2015 @ 12:50
    TikTakToe
    0

    well, i've now tried the CogUmbracoExamineMediaIndexer - i'm getting results but when i look at the index in luke, the fileTextContent is empty. I've checked the pdfs and can select the text (i.e. they're not just graphic pdfs).

    am i to assume that my pdfs are not searchable for some reason?

    new index via luke

  • TikTakToe 60 posts 102 karma points
    Apr 25, 2015 @ 13:24
    TikTakToe
    0

    ok, had a look into further - cant get the original pdf indexer to work, the CogUmbracoExamineMediaIndexer works but doesn't have any value in fileTextContent, so took a stab and downloaded lastest version of itextsharp core dll, popped that in the bin folder, re-ran the indexes and bingo! i now have indexed and searchable pdfs...

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies