Thanks to the umbraco.tv videos and some very kind support from this forum, we have managed to set-up content, member and PDF indexing/searching on our umbraco 4.5.2 site.
The problem we are now running into is that when the indexing process begins, it only seems to index 3000 or so of the 7000 PDF documents on our site. I understand the process of indexing PDF's is quite tricky and it may not be possible to index them all.
On checking the log table in the umbraco db, I found the indexing process to be erroring in the umbraco.examine.pdf dll when utilising the ITextSharp library.
I've added a try / catch around the area causing an exception to try and make the system ignore the error - hoping it would move to the next document, which it does for some. I am writing to a text file details of the offending documents.
I don't mind it ignoring these defective PDF's, we can try and find some commonality with the documents having problems at a later stage and get them re-published.
My main problem is that the indexing process sometimes seems to stop when it finds these. Not knowing exactly how / where the indexing processes is triggered means I'm at a loss to "force" it through, ignoring any defective documents.
Is there any way of verifying how many documents we have (according to umbraco) as well as how many have been successfully indexed to try and see the scale of the problem?
If we are getting only the 3000 out of 7000 documents indexed as I suspect, is there any way of "forcing" the system to index those it can do, whilst ignoring those that it can't.
Can you please help me out how to index pdf files in the media section. i need to search through the pdf files in my website which are under media section of the umbraco. I am able to search the content of the site using Umbraco Examine. Can you please help me out how to index pdf files.
What I cannot figure out is: In my search results I have content pages AND pdf files. I have to figure out which is what and then set the link accordingly.
FYI, I am using the Razor Cookbook Search as my template.
Any help would be MUCH appreciated. I am just learning Razor and am completely lost... lol
PDF Indexing with Umbraco Examine
HI,
Thanks to the umbraco.tv videos and some very kind support from this forum, we have managed to set-up content, member and PDF indexing/searching on our umbraco 4.5.2 site.
The problem we are now running into is that when the indexing process begins, it only seems to index 3000 or so of the 7000 PDF documents on our site. I understand the process of indexing PDF's is quite tricky and it may not be possible to index them all.
On checking the log table in the umbraco db, I found the indexing process to be erroring in the umbraco.examine.pdf dll when utilising the ITextSharp library.
I've added a try / catch around the area causing an exception to try and make the system ignore the error - hoping it would move to the next document, which it does for some. I am writing to a text file details of the offending documents.
I don't mind it ignoring these defective PDF's, we can try and find some commonality with the documents having problems at a later stage and get them re-published.
My main problem is that the indexing process sometimes seems to stop when it finds these. Not knowing exactly how / where the indexing processes is triggered means I'm at a loss to "force" it through, ignoring any defective documents.
Is there any way of verifying how many documents we have (according to umbraco) as well as how many have been successfully indexed to try and see the scale of the problem?
If we are getting only the 3000 out of 7000 documents indexed as I suspect, is there any way of "forcing" the system to index those it can do, whilst ignoring those that it can't.
Thanks for your help.
Ta,
Matt
Hi Mathew,
Can you please help me out how to index pdf files in the media section. i need to search through the pdf files in my website which are under media section of the umbraco. I am able to search the content of the site using Umbraco Examine. Can you please help me out how to index pdf files.
Thanks
Bobby
What I cannot figure out is: In my search results I have content pages AND pdf files. I have to figure out which is what and then set the link accordingly.
FYI, I am using the Razor Cookbook Search as my template.
Any help would be MUCH appreciated. I am just learning Razor and am completely lost... lol
is working on a reply...