However, it seems that the index is not updated when i publish content. The index file "segments" is only modified when i specify the indexpath in the webconfig.
There is another index file in the data/_systemUmbracoIndexDontDelete directory, which is modified when I publish new content.
Am I using some old documentation, which is invalid?
<!-- REQUIRED: MaxResults,IndexPath,SetName --> <!-- NOT Required: IndexParentId. If not specified then then all documents are indexed, otherwise only documents as children of the id are indexed --> <IndexSet SetName="forside" IndexPath="~/data/UmbracoExamine/" MaxResults="100"> <IndexUmbracoFields> <add Name="id" /> <!-- REQUIRED --> <add Name="nodeName" /> <!-- REQUIRED --> <add Name="updateDate" /> <add Name="writerName" /> <add Name="path" /> <add Name="nodeTypeAlias" /> <!-- REQUIRED --> </IndexUmbracoFields> <!-- The User defined fields to be indexed and searched. The UmbracoIndexer has methods to override the fields to be searched. --> <IndexUserFields> <add Name="pageTitle"/> <add Name="bodyText"/> </IndexUserFields> <!-- IncludeNodeTypes not required. If not specified, the indexer will index ALL document types--> <IncludeNodeTypes> <add Name="Forside"/> </IncludeNodeTypes>
<!-- ExcludeNodeTypes not required. If specified, these node types will not be indexed. --> <ExcludeNodeTypes/> </IndexSet>
I downloaded the latest source, and I have got it up and running..almost. Two problems:
1. The index doesn't update on publish (enableDefaultEventHandler is set to true), it only works if I manually call ExamineManager.Instance.RebuildIndex();
2. As far as I can tell it completely ignores the custom properties that I specify in IndexUserFields, I only get hits on the standard Umbraco fields, like nodeName.
My indexes are updated when I publish content. My config-file looks like the one provided with the source in the test-project. However, I noticed at one point, that the indexes were not updated after I had been inactive for some time. I then logged out and back in as administrator, and voila, the indexes were updated again.
However any search I make, returns 0 results, even if I search for the standard Umbraco Fields. My search is done with the command:
var results = ExamineManager.Instance.Search("query", 100, true);
Does anyone know if there is a way to see what is actually indexed?
The problem with custom properties not being indexed turned out to be a bug (Umbraco versions below 4.1). I managed to find it and created a patch for it on Codeplex. I actually found another bug also that I created a patch for. Head over and download :)
Logging in and out as administrator solved my problem with the indexes not updating, I have no idea why that does the trick, but thanks :)
And yes, there is a way to see whats indexed, you can do it with Luke (java program)
Make sure that you use the very latest version on CodePlex from the source code tab. I'll post up a new release shortly with documentation.
Once you update to the new version and follow the config setup found in the demo/test project, you'll want to do a couple things to ensure that the index is re-created:
Depending on your examine data path, by default this is App_Data/ExamineIndexes , you'll want to delete this folder completely. Examine will recreate all necessary folders for you. Once your config is re-setup, just publish a node and it will rebuild the entire thing.
There IS a bug found in 4.0.3 and below which doesn't always instantiate event listeners (IApplications). This is fixed in 4.1, and i'm sure an update will be released for pre 4.1 versions to fix this. If your affected by the bug, one way to solve it is to remove the Examine DLL from the bin folder, visit a page on your site (this restarts the app pool), then copy the dll back in a visit a page. The bug is due to umbraco trying to find IApplications that are already loaded into it's app pool when it should be looking for all IApplications in all of the DLLs. When you move the DLL out and then back in, umbraco will load that IApplication into its app pool and wire up all of the examine events. This is a weird bug and doesn't happen all of the time. It will also affect any other packages or custom code that use IApplication.
I didn't know about the bug with event listeners in 4.0.3, must have missed that! It explains the problems I have been having with events in a couple of other packages. Is there a patch for this?
Indexing works great now. I'm working on a site with 15.000 documents and growing, due for release in 2 weeks. Can Examine handle that (with the proper hardware of course) ?
Examine should be able to handle TONS of information. Underlying it's Lucene and it's very powerful. Have a look at some Lucene benchmarks, there are tons around (lots are based on the Java version, but shouldn't make any difference.. ).
The only thing to note with performance is the optimization, which in the latest version runs every 100 commits and on app pool startup. I would have to beleive that if your index is really big, optimization might become a bit slower (i'm not 100% positive though). Older version of examine don't deal with optimization very well (tries to optimize on every commit) so please make sure you use the latest checkin. I'll hopefully have a release and documentation up in a week or so (just have to get the rest of my umbraco 4.1 things done too). From memory, i think i made the optimize threshold (100), configurable, but this will all be in the docs in due time.
I'm glad its working well for you and please keep the feedback coming in.
var searchResults = ExamineManager.Instance.SearchProviderCollection["SiteIndex"].Search(examineQuery);
****************
I'm using Examine release 52101 on 4.0.3 of umbraco. I've looked at the testing examples and as far as I can see I've put in the correct syntax - anyone spot if I've done something stupid here??
Upgraded to the lastest checkin and got everything working, nice! Can't find the max record count parameter anymore though, It's still in the docs, but not in the code?
The max results is no longer valid (it didn't do anything for the last few builds). Due to the internal designs of Examine it didn't do anything, so we've dropped it.
That's not to do with Examine, that's to do with the indexer which is built into the Umbraco 4.0 source code.
Easiest way I've found to solve that problem is to add a handler to the before indexing event (I think it's on content, I don't have umbraco open so I can't be 100% sure) and then cancel that event.
We've had to use it in the past when dealing with lots of dynamic node creation.
Was there ever a patch of sorts available anywhere to get Examine working in Umbraco 4.0.x with custom indexes?
As soon as we switch from classic pipeline mode to integrated mode on our Win2008 box the events fail to complete. The folders are built and the queue files appear but then it doesnt go any further.
This is becoming quite a desperate situation for us in our current build.
The majority of our servers are IIS6 still, and the ones I can think of that use Examine in 4.0 are on IIS6.
I've never seen sites running into problems running in Classic mode on IIS7 though, I believe that the sites we have on IIS7 are a mixture of classic and integrated app pools.
The problem that Nalle has mentioned isn't related to Examine though, that was from the old Lucene.Net implementation.
Umbraco Examine Search Setup
Hi.
I am trying to set up search using Examine in Umbraco 4.0.3 as instructed on http://farmcode.org/?tag=/umbraco+examine
However, it seems that the index is not updated when i publish content. The index file "segments" is only modified when i specify the indexpath in the webconfig.
There is another index file in the data/_systemUmbracoIndexDontDelete directory, which is modified when I publish new content.
Am I using some old documentation, which is invalid?
Cheers
Morten
Which release of Examine are you using?
Umbraco 4.0.x had its own Lucene.NET indexer in it, which is what generates the /data/_system... folder.
Can you post your Examine config
Thanks for the quick reply :-)
I am using version 4 (UmracoExaminev4).
The configuration is as follows:
<configuration>
<configSections>
.
.
<section name="UmbLuceneIndex" type="TheFarm.Umbraco.Lucene.Common.Configuration.IndexSets, TheFarm.Umbraco.Lucene.Common" />
</configSections>
<!-- DefaultIndexSet,EnableDefaultActionHandler: REQUIRED -->
<UmbLuceneIndex DefaultIndexSet="forside" EnableDefaultActionHandler="true">
<!-- REQUIRED: MaxResults,IndexPath,SetName -->
<!-- NOT Required: IndexParentId. If not specified then then all documents are indexed, otherwise only documents as children of the id are indexed -->
<IndexSet SetName="forside" IndexPath="~/data/UmbracoExamine/" MaxResults="100">
<IndexUmbracoFields>
<add Name="id" /> <!-- REQUIRED -->
<add Name="nodeName" /> <!-- REQUIRED -->
<add Name="updateDate" />
<add Name="writerName" />
<add Name="path" />
<add Name="nodeTypeAlias" /> <!-- REQUIRED -->
</IndexUmbracoFields>
<!-- The User defined fields to be indexed and searched. The UmbracoIndexer has methods to override the fields to be searched. -->
<IndexUserFields>
<add Name="pageTitle"/>
<add Name="bodyText"/>
</IndexUserFields>
<!-- IncludeNodeTypes not required. If not specified, the indexer will index ALL document types-->
<IncludeNodeTypes>
<add Name="Forside"/>
</IncludeNodeTypes>
<!-- ExcludeNodeTypes not required. If specified, these node types will not be indexed. -->
<ExcludeNodeTypes/>
</IndexSet>
</UmbLuceneIndex>
You're using a very old version of Examine, we don't support that one any more.
Grab the latest from codeplex. It's got examples in the source of how you can use it - http://umbracoexamine.codeplex.com/
Thanks again.
It seems I am not able to find a newer version than the one from july 2 2009?
Ok, I think I found it under "Source Code". Will try it out...
Shan hasn't put any new releases up on codeplex to download for quite some time. You're best off downloading the source and compiling it yourself.
It also gives you the demos to play with that way ;).
Out-of-band releases are low on the priority list recently as we're including it as part of 4.1 so it keeps being dropped with that.
I downloaded the latest source, and I have got it up and running..almost. Two problems:
1. The index doesn't update on publish (enableDefaultEventHandler is set to true), it only works if I manually call ExamineManager.Instance.RebuildIndex();
2. As far as I can tell it completely ignores the custom properties that I specify in IndexUserFields, I only get hits on the standard Umbraco fields, like nodeName.
I would be extremely happy for some guidance :)
Thanks
/N
Hi Nalle.
My indexes are updated when I publish content.
My config-file looks like the one provided with the source in the test-project. However, I noticed at one point, that the indexes were not updated after I had been inactive for some time. I then logged out and back in as administrator, and voila, the indexes were updated again.
However any search I make, returns 0 results, even if I search for the standard Umbraco Fields.
My search is done with the command:
var results = ExamineManager.Instance.Search("query", 100, true);
Does anyone know if there is a way to see what is actually indexed?
Cheers
Morten
Hi Morten!
I have some answers :)
The problem with custom properties not being indexed turned out to be a bug (Umbraco versions below 4.1). I managed to find it and created a patch for it on Codeplex. I actually found another bug also that I created a patch for. Head over and download :)
Logging in and out as administrator solved my problem with the indexes not updating, I have no idea why that does the trick, but thanks :)
And yes, there is a way to see whats indexed, you can do it with Luke (java program)
http://www.getopt.org/luke/
Just run the webstart version and point it to your index folder
Good luck :)
/N
Hi Nalle!
Thanks alot :-)
Will get the patch right away...
Cheers
Morten
Yes Luke is a great tool.
Make sure that you use the very latest version on CodePlex from the source code tab. I'll post up a new release shortly with documentation.
Once you update to the new version and follow the config setup found in the demo/test project, you'll want to do a couple things to ensure that the index is re-created:
Depending on your examine data path, by default this is App_Data/ExamineIndexes , you'll want to delete this folder completely. Examine will recreate all necessary folders for you. Once your config is re-setup, just publish a node and it will rebuild the entire thing.
There IS a bug found in 4.0.3 and below which doesn't always instantiate event listeners (IApplications). This is fixed in 4.1, and i'm sure an update will be released for pre 4.1 versions to fix this. If your affected by the bug, one way to solve it is to remove the Examine DLL from the bin folder, visit a page on your site (this restarts the app pool), then copy the dll back in a visit a page. The bug is due to umbraco trying to find IApplications that are already loaded into it's app pool when it should be looking for all IApplications in all of the DLLs. When you move the DLL out and then back in, umbraco will load that IApplication into its app pool and wire up all of the examine events. This is a weird bug and doesn't happen all of the time. It will also affect any other packages or custom code that use IApplication.
hope this helps.
Thanks for the info Shannon!
I didn't know about the bug with event listeners in 4.0.3, must have missed that! It explains the problems I have been having with events in a couple of other packages. Is there a patch for this?
Indexing works great now. I'm working on a site with 15.000 documents and growing, due for release in 2 weeks. Can Examine handle that (with the proper hardware of course) ?
Cheers,
/N
Examine should be able to handle TONS of information. Underlying it's Lucene and it's very powerful. Have a look at some Lucene benchmarks, there are tons around (lots are based on the Java version, but shouldn't make any difference.. ).
The only thing to note with performance is the optimization, which in the latest version runs every 100 commits and on app pool startup. I would have to beleive that if your index is really big, optimization might become a bit slower (i'm not 100% positive though). Older version of examine don't deal with optimization very well (tries to optimize on every commit) so please make sure you use the latest checkin. I'll hopefully have a release and documentation up in a week or so (just have to get the rest of my umbraco 4.1 things done too). From memory, i think i made the optimize threshold (100), configurable, but this will all be in the docs in due time.
I'm glad its working well for you and please keep the feedback coming in.
Hi Nalle,
can you tell me where to find the umbraco-lucene custom properties patch?
Thanks!
Hi there, there doesn't seem to be a specific forum to put this, so here goes.
I'm trying to restrict my Examine searches to a specific nodeTypeAlias and not having much luck. Here is my code...
****************
string[] searchFields = { "nodeName", "TitleText", "Summary", "BodyText", "MetaKeywords", "MetaDescription" };
string restrictedNodeTypeAlias = "SMPublication";
var examineQuery = sc
.NodeTypeAlias(restrictedNodeTypeAlias.MultipleCharacterWildcard())
.And()
.GroupedOr(searchFields, queryText)
.Compile();
var searchResults = ExamineManager.Instance.SearchProviderCollection["SiteIndex"].Search(examineQuery);
****************
I'm using Examine release 52101 on 4.0.3 of umbraco. I've looked at the testing examples and as far as I can see I've put in the correct syntax - anyone spot if I've done something stupid here??
Many thanks
Neil
Are you getting too many results back or miss-matching results or what?
Can you post the Lucene search query generated (if you do searchCriteria.ToString() you'll see it, it also appears in the debugger).
Upgraded to the lastest checkin and got everything working, nice! Can't find the max record count parameter anymore though, It's still in the docs, but not in the code?
/N
The max results is no longer valid (it didn't do anything for the last few builds). Due to the internal designs of Examine it didn't do anything, so we've dropped it.
Alright, all good then :)
Happy to report Umbraco and Examine is now powering a major swedish site: http://stureplan.se
After a lot of hard work we finally switched from the old platform yesterday, so far so good!
/N
Awesome to know :D
I've been following Examine and trying to understand how to use it in Umbraco 4.0.x and playing with the tests in the sourcecode package.
Nalle, when I search on something like 'djungelfeber' on your site, I would expect at least one result. Is it not updated regularly?
Thanks for pointing this out Marc. Seems like indexing is not working properly at the moment. On save/publish we now get this in the log:
Error indexing node: Lucene.Net.Store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@D:\web\applications\stureplan.se\wwwroot1\data\_systemUmbracoIndexDontDelete\write.lock
at Lucene.Net.Store.Lock.Obtain(Int64 lockWaitTimeout)
at Lucene.Net.Index.IndexWriter.Init(Directory d, Analyzer a, Boolean create, Boolean closeDir, IndexDeletionPolicy deletionPolicy, Boolean autoCommit, Int32 maxFieldLength, IndexingChain indexingChain, IndexCommit commit)
at Lucene.Net.Index.IndexWriter..ctor(String path, Analyzer a, Boolean create)
at umbraco.cms.businesslogic.index.Indexer.ContentIndex(Boolean ForceRecreation)
at umbraco.cms.businesslogic.index.Indexer.IndexNode(Guid ObjectType, Int32 Id, String Text, String UserName, DateTime CreateDate, Hashtable Fields, Boolean Optimize)
at umbraco.cms.businesslogic.web.Document.Index(Boolean Optimze)
Any ideas?
That's not to do with Examine, that's to do with the indexer which is built into the Umbraco 4.0 source code.
Easiest way I've found to solve that problem is to add a handler to the before indexing event (I think it's on content, I don't have umbraco open so I can't be 100% sure) and then cancel that event.
We've had to use it in the past when dealing with lots of dynamic node creation.
Was there ever a patch of sorts available anywhere to get Examine working in Umbraco 4.0.x with custom indexes?
As soon as we switch from classic pipeline mode to integrated mode on our Win2008 box the events fail to complete. The folders are built and the queue files appear but then it doesnt go any further.
This is becoming quite a desperate situation for us in our current build.
Martin
Try changing the config to runAsync="false" on the indexer. That'll then log to the Umbraco log table and be able to see what the problem is.
But we're running Examine on a few 4.0 sites with no problem
Hi slace
Can you confirm what server and what app pool you run your sites under?
The problem only seems to happen in integrated mode, classic is fine.
We would prefer to use integrated mode because it's recommmended by Umbraco and there's all sorts of functionality which only works in this mode.
Martin
The majority of our servers are IIS6 still, and the ones I can think of that use Examine in 4.0 are on IIS6.
I've never seen sites running into problems running in Classic mode on IIS7 though, I believe that the sites we have on IIS7 are a mixture of classic and integrated app pools.
The problem that Nalle has mentioned isn't related to Examine though, that was from the old Lucene.Net implementation.
is working on a reply...