how can i automatically re index pdf documents with examine - Extending Umbraco

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Mike Hamilton 6 posts 26 karma points

Jan 07, 2011 @ 22:22

0

How can I automatically re-Index PDF Documents with Examine?

Hi,

I'm building an Umbraco site that a client is using as a searchable document repository. In my ExamineIndexProvider element in ExamineSettings.config, I have a value of "10" for the interval and runAsync set to "true".

The problem is, the indexer isn't updating automatically when PDFs are added/removed from the system. I've found the only thing that works is to completely blow-away the Index Set folder inside /App_Data and iisreset the site. Then everything re-appears as it should after a few minutes.

Can anyone offer any advice?

Thanks!
-Mike

Copy Link
Jan Skovgaard 11280 posts 23678 karma points MVP 11x admin c-trib

Jan 07, 2011 @ 22:51

0

Hi Mike

I'm not really a .NET wizard in any way - But I'm thinking that it perhaps should be possible to hook into the beforesave event when a PDF file is updated. I'm thinking that once you update a PDF file you must trigger the save event either in the content section or in the media section dependant on where you update the files.

Then you of course need to check if the filetype is a PDF and then execute the code to re-index the lucene index.

A detailed list of events to hook into can be found here: http://our.umbraco.org/wiki/reference/api-cheatsheet/using-applicationbase-to-register-events/overview-of-all-events

/Jan

Copy Link
Mike Hamilton 6 posts 26 karma points

Jan 12, 2011 @ 22:01

0

Hey Jan,

Thanks for your reply. Even with your help, I still wasn't able to get the PDF content to re-index as it should. I ended-up creating a .bat script that stopped IIS, deleted the IndexSet folder, then re-started IIS. That worked for us and we could schedule it to run once a day during the middle of the night.

But! Then I found the new "Examine Index Admin" project (http://our.umbraco.org/projects/backoffice-extensions/examine-index-admin) and it worked even better! It allows us to force a refresh of the whole search index and updates the PDF content just fine. So now when new content is added, we can refresh just the Examine index without having to bring the whole site down.

I still think there is a bug or something in the PDF indexer, however. I mean when you create a page, it automatically gets indexed and becomes available for searching right away. That is NOT happening with PDF documents at the moment. You have to re-build the whole index even to just add one file...

-Mike

Copy Link

Connie DeCinko 931 posts 1160 karma points

May 12, 2011 @ 00:30

I'm trying to set up an index of SQL data and perhaps have something wrong. When I click to rebuild that index, I get a YSOD:

Object reference not set to an instance of an object.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.

Source Error:


Line 28:             Button clickedButton = (Button)sender;
Line 29:             string indexToRebuild = clickedButton.CommandArgument;
Line 30:             ExamineManager.Instance.IndexProviderCollection[indexToRebuild].RebuildIndex();
Line 31:             result.Text = string.Format("Index {0} added to index queue", indexToRebuild);
Line 32:             result.Visible = true;

Source File: c:\inetpub\wwwroot\usercontrols\Umbraco\ExamineIndexAdmin.ascx.cs Line: 30

Copy Link

Braydie 148 posts 346 karma points

Dec 12, 2011 @ 17:33

0

Hi Connie, did you resolve this? I'm running into something similar when trying to reindex content.. Seems I can get the reindex to work on one index set but not on any others, and the only difference between them appears to be excluded Document Types.

Copy Link
is working on a reply...

Please Sign in or register to post replies

Flag this post as spam?

How can I automatically re-Index PDF Documents with Examine?

Object reference not set to an instance of an object.