pdf indexing with umbraco examine

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Matthew Bettesworth 14 posts 35 karma points

Dec 06, 2010 @ 11:29

1

PDF Indexing with Umbraco Examine

HI,

Thanks to the umbraco.tv videos and some very kind support from this forum, we have managed to set-up content, member and PDF indexing/searching on our umbraco 4.5.2 site.

The problem we are now running into is that when the indexing process begins, it only seems to index 3000 or so of the 7000 PDF documents on our site. I understand the process of indexing PDF's is quite tricky and it may not be possible to index them all.

On checking the log table in the umbraco db, I found the indexing process to be erroring in the umbraco.examine.pdf dll when utilising the ITextSharp library.

I've added a try / catch around the area causing an exception to try and make the system ignore the error - hoping it would move to the next document, which it does for some. I am writing to a text file details of the offending documents.

I don't mind it ignoring these defective PDF's, we can try and find some commonality with the documents having problems at a later stage and get them re-published.

My main problem is that the indexing process sometimes seems to stop when it finds these. Not knowing exactly how / where the indexing processes is triggered means I'm at a loss to "force" it through, ignoring any defective documents.

Is there any way of verifying how many documents we have (according to umbraco) as well as how many have been successfully indexed to try and see the scale of the problem?

If we are getting only the 3000 out of 7000 documents indexed as I suspect, is there any way of "forcing" the system to index those it can do, whilst ignoring those that it can't.

Thanks for your help.

Ta,

Matt

Copy Link
Bobby 43 posts 63 karma points

May 04, 2012 @ 13:24

0

Hi Mathew,

Can you please help me out how to index pdf files in the media section. i need to search through the pdf files in my website which are under media section of the umbraco. I am able to search the content of the site using Umbraco Examine. Can you please help me out how to index pdf files.

Thanks

Bobby

Copy Link
MikeD 92 posts 112 karma points

Aug 08, 2012 @ 21:59

0

What I cannot figure out is: In my search results I have content pages AND pdf files. I have to figure out which is what and then set the link accordingly.

FYI, I am using the Razor Cookbook Search as my template.

Any help would be MUCH appreciated. I am just learning Razor and am completely lost... lol

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Continue discussion

Please Sign in or register to post replies

Flag this post as spam?

PDF Indexing with Umbraco Examine