Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Matt 7 posts 27 karma points
    Mar 11, 2014 @ 13:47
    Matt
    0

    Indexing PDF Media

    Hi,

    I'm trying to implement the ezSearch package to allow customers to my site to search nodes as well as media. It works great except it doesn't seem to be able to retrieve the contents of the PDF documents it indexes.

    Upon further investigation of the indexes with Luke, the 'contnets' field seems to contain loads of random info including a GUID, a couple of dates and some file names etc.

    In previous versions of umbraco, I successfully used the indexer in the Umbraco.Examine.PDF namespace to index my documents. I think it used iTextSharp under the hood as I had to override some of the error checking a while back and re-compile the dll so that I could get it to index a few '000 docs on a site I was managing then.

    Can anyone help me with a solution to index my PDF documents and their contents in Umbraco v7 please?

    Thanks, Matt

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Mar 11, 2014 @ 14:10
    Ismail Mayat
    0

    Matt,

    Are all the pdfs indexed with random info or just some of them. I recall the disclaimer from shannon that not all pdf's can be indexed.

    Regards

    Ismail

  • Matt 7 posts 27 karma points
    Mar 11, 2014 @ 14:35
    Matt
    0

    Hi Ismail,

    Thanks for the quick reply.

    I've tried three documents now and they're all looking the same in the index. I may have just been really unlucky with the documents I've tried...

    Would you expect the ExternalIndexer to index the contents of PDF documents out-of-the-box or do I need to do something like include an explicit PDF indexer as I did in my 4.x projects? Does the Umbraco.examine.PDF namespace with its indexing methods still exist and work do you know?

    Thanks again, Matt

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Mar 11, 2014 @ 14:53
    Ismail Mayat
    0

    Matt,

    I thought you were using the pdf indexer? You will need that as externalindexer is content only. If you setup the pdf indexer as you used to then in theory you should get pdfs in the pdf index.

    Regards

    Ismail

Please Sign in or register to post replies

Write your reply to:

Draft