Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Matthew Bettesworth 14 posts 35 karma points
    Dec 06, 2010 @ 11:29
    Matthew Bettesworth
    1

    PDF Indexing with Umbraco Examine

    HI,

    Thanks to the umbraco.tv videos and some very kind support from this forum, we have managed to set-up content, member and PDF indexing/searching on our umbraco 4.5.2 site.

    The problem we are now running into is that when the indexing process begins, it only seems to index 3000 or so of the 7000 PDF documents on our site. I understand the process of indexing PDF's is quite tricky and it may not be possible to index them all.

    On checking the log table in the umbraco db, I found the indexing process to be erroring in the umbraco.examine.pdf dll when utilising the ITextSharp library.

    I've added a try / catch around the area causing an exception to try and make the system ignore the error - hoping it would move to the next document, which it does for some. I am writing to a text file details of the offending documents.

    I don't mind it ignoring these defective PDF's, we can try and find some commonality with the documents having problems at a later stage and get them re-published.

    My main problem is that the indexing process sometimes seems to stop when it finds these. Not knowing exactly how / where the indexing processes is triggered means I'm at a loss to "force" it through, ignoring any defective documents.

    Is there any way of verifying how many documents we have (according to umbraco) as well as how many have been successfully indexed to try and see the scale of the problem?

    If we are getting only the 3000 out of 7000 documents indexed as I suspect, is there any way of "forcing" the system to index those it can do, whilst ignoring those that it can't.


    Thanks for your help.

    Ta,

    Matt

  • Bobby 43 posts 63 karma points
    May 04, 2012 @ 13:24
    Bobby
    0

    Hi Mathew,

     

    Can you please help me out how to index pdf files in the media section. i need to search through the pdf files in my website which are under media section of the umbraco. I am able to search the content of the site using Umbraco Examine. Can you please help me out how to index pdf files.

     

    Thanks

    Bobby

  • MikeD 92 posts 112 karma points
    Aug 08, 2012 @ 21:59
    MikeD
    0

    What I cannot figure out is:  In my search results I have content pages AND pdf files.  I have to figure out which is what and then set the link accordingly.

    FYI, I am using the Razor Cookbook Search as my template.

    Any help would be MUCH appreciated.  I am just learning Razor and am completely lost... lol

Please Sign in or register to post replies

Write your reply to:

Draft