Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Ranjit J. Vaity 66 posts 109 karma points
    Apr 25, 2010 @ 22:19
    Ranjit J. Vaity
    0

    Indexing contents from PDF files.

    Hi,

    Currently I have a requirement in our project, where in I are Editors can upload PDF documents using Umbraco admin section.

    Later if user searches for a keyword, along with content on the umbraco pages it should also search the content of the PDF docs.

    In here, I need to index contents from PDF documents from media folder.

    After couple of hours of research, I have seen we can Index using Lucene but can only be done on text files. So now I need to extract text from PDFs files now. And later index on those text files.

    I am currently looking in to iTextSharp. But not able to find any method to extract from PDF pages.

    Any suggestions, advice and help will be appreciated.

    Thank you in advance.

    /Ranjit J. Vaity

     

     

  • Dirk De Grave 4541 posts 6021 karma points MVP 3x admin c-trib
    Apr 26, 2010 @ 09:32
    Dirk De Grave
    0

    Hi Ranjit,

    Check this project and see if that fits your needs.

     

    Cheers,

    /Dirk

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Apr 26, 2010 @ 10:21
    Darren Ferguson
    0

    Also consider generating PDF's from within Umbraco - http://our.umbraco.org/projects/xsl-pdf-creator

    The content nodes should then be indexed OOTB by Examine in Umbraco 4.1

     

     

     

  • Ranjit J. Vaity 66 posts 109 karma points
    May 02, 2010 @ 17:48
    Ranjit J. Vaity
    0

    Hi Dirk / Darren,

    The link provided by you got excellent stuff, really I can use.

    Thanks you for your reply, guys.

    Cheers,

    /Ranjit J. Vaity

     

     

     

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies