How can I automatically re-Index PDF Documents with Examine?
Hi,
I'm building an Umbraco site that a client is using as a searchable document repository. In my ExamineIndexProvider element in ExamineSettings.config, I have a value of "10" for the interval and runAsync set to "true".
The problem is, the indexer isn't updating automatically when PDFs are added/removed from the system. I've found the only thing that works is to completely blow-away the Index Set folder inside /App_Data and iisreset the site. Then everything re-appears as it should after a few minutes.
I'm not really a .NET wizard in any way - But I'm thinking that it perhaps should be possible to hook into the beforesave event when a PDF file is updated. I'm thinking that once you update a PDF file you must trigger the save event either in the content section or in the media section dependant on where you update the files.
Then you of course need to check if the filetype is a PDF and then execute the code to re-index the lucene index.
Thanks for your reply. Even with your help, I still wasn't able to get the PDF content to re-index as it should. I ended-up creating a .bat script that stopped IIS, deleted the IndexSet folder, then re-started IIS. That worked for us and we could schedule it to run once a day during the middle of the night.
But! Then I found the new "Examine Index Admin" project (http://our.umbraco.org/projects/backoffice-extensions/examine-index-admin) and it worked even better! It allows us to force a refresh of the whole search index and updates the PDF content just fine. So now when new content is added, we can refresh just the Examine index without having to bring the whole site down.
I still think there is a bug or something in the PDF indexer, however. I mean when you create a page, it automatically gets indexed and becomes available for searching right away. That is NOT happening with PDF documents at the moment. You have to re-build the whole index even to just add one file...
I'm trying to set up an index of SQL data and perhaps have something wrong. When I click to rebuild that index, I get a YSOD:
Object reference not set to an instance of an object.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.
Source Error:
Line 28: Button clickedButton = (Button)sender;
Line 29: string indexToRebuild = clickedButton.CommandArgument;
Line 30: ExamineManager.Instance.IndexProviderCollection[indexToRebuild].RebuildIndex();
Line 31: result.Text = string.Format("Index {0} added to index queue", indexToRebuild);
Line 32: result.Visible = true;
Hi Connie, did you resolve this? I'm running into something similar when trying to reindex content.. Seems I can get the reindex to work on one index set but not on any others, and the only difference between them appears to be excluded Document Types.
How can I automatically re-Index PDF Documents with Examine?
Hi,
I'm building an Umbraco site that a client is using as a searchable document repository. In my ExamineIndexProvider element in ExamineSettings.config, I have a value of "10" for the interval and runAsync set to "true".
The problem is, the indexer isn't updating automatically when PDFs are added/removed from the system. I've found the only thing that works is to completely blow-away the Index Set folder inside /App_Data and iisreset the site. Then everything re-appears as it should after a few minutes.
Can anyone offer any advice?
Thanks!
-Mike
Hi Mike
I'm not really a .NET wizard in any way - But I'm thinking that it perhaps should be possible to hook into the beforesave event when a PDF file is updated. I'm thinking that once you update a PDF file you must trigger the save event either in the content section or in the media section dependant on where you update the files.
Then you of course need to check if the filetype is a PDF and then execute the code to re-index the lucene index.
A detailed list of events to hook into can be found here: http://our.umbraco.org/wiki/reference/api-cheatsheet/using-applicationbase-to-register-events/overview-of-all-events
/Jan
Hey Jan,
Thanks for your reply. Even with your help, I still wasn't able to get the PDF content to re-index as it should. I ended-up creating a .bat script that stopped IIS, deleted the IndexSet folder, then re-started IIS. That worked for us and we could schedule it to run once a day during the middle of the night.
But! Then I found the new "Examine Index Admin" project (http://our.umbraco.org/projects/backoffice-extensions/examine-index-admin) and it worked even better! It allows us to force a refresh of the whole search index and updates the PDF content just fine. So now when new content is added, we can refresh just the Examine index without having to bring the whole site down.
I still think there is a bug or something in the PDF indexer, however. I mean when you create a page, it automatically gets indexed and becomes available for searching right away. That is NOT happening with PDF documents at the moment. You have to re-build the whole index even to just add one file...
-Mike
I'm trying to set up an index of SQL data and perhaps have something wrong. When I click to rebuild that index, I get a YSOD:
Object reference not set to an instance of an object.
Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.
Source Error:
Line 28: Button clickedButton = (Button)sender; Line 29: string indexToRebuild = clickedButton.CommandArgument; Line 30: ExamineManager.Instance.IndexProviderCollection[indexToRebuild].RebuildIndex(); Line 31: result.Text = string.Format("Index {0} added to index queue", indexToRebuild); Line 32: result.Visible = true;
Source File: c:\inetpub\wwwroot\usercontrols\Umbraco\ExamineIndexAdmin.ascx.cs Line: 30
Hi Connie, did you resolve this? I'm running into something similar when trying to reindex content.. Seems I can get the reindex to work on one index set but not on any others, and the only difference between them appears to be excluded Document Types.
is working on a reply...