Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Tom 713 posts 954 karma points
    Jan 21, 2010 @ 02:30
    Tom
    0

    search documents including pdfs?

    Hi i was wondering the best way to implement a search including indexed pdfs I've seen stuff like umbraco examine and was wondering how you'd integrate pdfbox or something in to get it to include pdf files as well?

    Regards,

    Tom

  • Dirk De Grave 4541 posts 6021 karma points MVP 3x admin c-trib
    Jan 21, 2010 @ 10:28
    Dirk De Grave
    0

    Have a look at umbSearch2 which has this functionality built-in!

     

    Cheers,

    /Dirk

  • Allan James 20 posts 40 karma points
    Jan 22, 2012 @ 18:10
    Allan James
    0

    Dirk,

    I recently completed both Levels of training and now certified but for the life of me cannot see where I can setup a search that will ALSO search inside PDFS or WORD Documents. I understand that I will have to setup a iFilter for these types but the UmbSearch2 appears to be a dead project and also seem to not work in 4.7.

    The XSLT search obviously doesn't search within documents ... so where is this capability. Wouldn't one think that this is something many of us need... or am I the only person?

     

    Could you please point me in the right direction?

    Allan

     

  • Tim 1193 posts 2675 karma points MVP 4x c-trib
    Jan 23, 2012 @ 17:43
    Tim
    0

    @Allan,

    If you download the latest version of Examine from http://examine.codeplex.com then you'll see that it has support for indexing PDFs as well as Umbraco content. It doesn't do Word documents though. By default you have to have a separate index for your PDF. But you can combine multiple indexes in Examine using a Lucene MultiIndexSearcher.

    Its a bit fiddly to set up, but once you get it working its pretty straightforward. The codeplex site has examples of indexing PDF content I think. One trick that you can use to make searching multiple indexes easier is to attach to the NodeIndexing event of the PDF index, and create fields with the samer name as the content index for fields that you want to search on. For example using the media file name as the page title and the contents of the file as the page copy. That way you can do a combined search on a single set of fields.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jan 23, 2012 @ 18:02
    Ismail Mayat
    0

    Tim,Allan,

    Before cross index searching searching thats how i did pdf searching with main index. I would now go down 2 index route as latest version of examine can cross search.  To index search word docs you would need to create your own indexer / searcher take a look at the code for pdf searcher. To extract out the word content you can as you have already observed use iFilter.

    Regards

     

    Ismail

  • Tim 1193 posts 2675 karma points MVP 4x c-trib
    Jan 23, 2012 @ 18:06
    Tim
    0

    Ismail is the cross index searching stuff the same as the MultiIndexSearcher, or something different? If its different, can you provide some details?

    :)

  • Allan James 20 posts 40 karma points
    Jan 24, 2012 @ 05:04
    Allan James
    0

    @Ismail or @Tim or @Anyone

    I am desperate to have this site of mine be able to search the site AND within the contents of PDF files ... I am willing to pay for someone to implement what I need to get this done... my site (staging) is at http://crdha.allanlevsen.com - I guess I would need you to make the changes to the examine config files, any DLL I need, and a search control and results page.

    I would REALLY appreciate this and need it quickly - my email is [email protected] and I would pay you through paypal if that works.

    Thanks,

    Allan

Please Sign in or register to post replies

Write your reply to:

Draft