examine pdf not indexing pdfs

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

karen 186 posts 461 karma points

Feb 24, 2016 @ 17:20
0

Examine PDF not indexing PDFs
Using Umbraco v6.2.5.

I just installed Umbraco Examine PDF using nuget. (Also tried uninstalling and then reinstalling as some other posts suggested trying).

I am looking at the index using Luke, and I notice it does not look like is actually indexing any PDFs.

Example, I see a NodeId of 1221, only NodeId=1221 is a folder containing PDFs. But none of the PDFs in that folder are indexed (based on looking at the nodeID of the PDF and what is displaying in Luke)

Going down the list of NodeIds listed in Luke, it looks like they are all folders in the media section.

Any thoughts on why that is happening? I have seen lots of unanswered questions regarding Examine and PDFs, but haven't seen any comments with this (example perhaps people saying they are only seeing the NodeId and no other fields are also having this issue, just not realizing it)

I am using the default values put in the config from the nuget installer:
```
<add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF" extensions=".pdf" umbracoFileProperty="umbracoFile"/>

<add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"/>

<IndexSet SetName="PDFIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/PDFs"/>
```
Copy Link
karen 186 posts 461 karma points

Feb 26, 2016 @ 19:26

0

Well interesting follow up. I deleted the nuget package, and added the dlls manually from this page: https://github.com/umbraco/UmbracoExamine.PDF/releases/tag/v1.0.0

Now I have a different set of indexes, including some with the 'FileTextContent' field. However it is still not indexing ALL the pdfs, I uploaded a new PDF and the index did not change, tried re-indexing it via the developer tab but still no changes (same number of items). Some of the items indexed are still folders, not actual pdf files.

Going to grab the source and see what I can find out from there.

Copy Link
colin gray 16 posts 56 karma points

May 23, 2016 @ 16:22

1

I had got my developing site to index pdfs, after cms refresh from nuget (other issue), pdfs would no-longer index (dev > examine > indexing)... so "added the dlls manually from this page:" https://github.com/umbraco/UmbracoExamine.PDF/releases/tag/v1.0.0 It would re-index again!

your issue: if its indexing some, but not all, I suspect its the pdf structure of the failing files. pdf text is broken up by formatting within words? Try publishing a un-formatted version of the same pdf. If you need the fancy layout. try all the text, plain, white on white, 4pt on the last 3 page etc.

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Examine PDF not indexing PDFs