indexing pdf media

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Matt 7 posts 27 karma points

Mar 11, 2014 @ 13:47

0

Indexing PDF Media

Hi,

I'm trying to implement the ezSearch package to allow customers to my site to search nodes as well as media. It works great except it doesn't seem to be able to retrieve the contents of the PDF documents it indexes.

Upon further investigation of the indexes with Luke, the 'contnets' field seems to contain loads of random info including a GUID, a couple of dates and some file names etc.

In previous versions of umbraco, I successfully used the indexer in the Umbraco.Examine.PDF namespace to index my documents. I think it used iTextSharp under the hood as I had to override some of the error checking a while back and re-compile the dll so that I could get it to index a few '000 docs on a site I was managing then.

Can anyone help me with a solution to index my PDF documents and their contents in Umbraco v7 please?

Thanks, Matt

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Mar 11, 2014 @ 14:10

0

Matt,

Are all the pdfs indexed with random info or just some of them. I recall the disclaimer from shannon that not all pdf's can be indexed.

Regards

Ismail

Copy Link
Matt 7 posts 27 karma points

Mar 11, 2014 @ 14:35

0

Hi Ismail,

Thanks for the quick reply.

I've tried three documents now and they're all looking the same in the index. I may have just been really unlucky with the documents I've tried...

Would you expect the ExternalIndexer to index the contents of PDF documents out-of-the-box or do I need to do something like include an explicit PDF indexer as I did in my 4.x projects? Does the Umbraco.examine.PDF namespace with its indexing methods still exist and work do you know?

Thanks again, Matt

Copy Link
Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Mar 11, 2014 @ 14:53

0

Matt,

I thought you were using the pdf indexer? You will need that as externalindexer is content only. If you setup the pdf indexer as you used to then in theory you should get pdfs in the pdf index.

Regards

Ismail

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Indexing PDF Media