My understanding from the documentation is that indexes are supposed to be opt-in; that is to say, if you define at least one inclusion, it will ignore all others.
However, in all of my testing, search results return all document types. I've deleted and rebuilt the index a few times, so I know it's not stale data.
For now I'm working around this by adding search criteria based on the nodeTypeAlias field, but it's not a great solution.
P.S. This is a vanilla implementation, with no custom publishing events etc.
OK, I changed runAsync to false, deleted the Ecard index folder, recycled the app and published one node. Here is the contents of the log:
id userId NodeId Datestamp logHeader logComment 32546 0 -1 2010-09-22 14:06:59.863 Error [UmbracoExamine] Cannot index queue items, another indexing operation is currently in progress, 32545 2 -1 2010-09-22 14:06:52.613 Debug Xml saved in 00:00:00.0146413 32544 2 2446 2010-09-22 14:06:52.543 Publish 32543 0 -1 2010-09-22 14:06:49.860 Error [UmbracoExamine] Cannot index queue items, another indexing operation is currently in progress, 32542 0 -1 2010-09-22 14:06:49.860 Error [UmbracoExamine] Cannot index queue items, another indexing operation is currently in progress, ... SiteMapProvider errors 32531 2 2446 2010-09-22 14:06:39.910 Publish 32530 2 2446 2010-09-22 14:06:38.567 Save 32529 2 2446 2010-09-22 14:06:26.660 Open 32528 0 -1 2010-09-22 14:06:24.960 System Application started at 9/22/2010 2:06:24 PM
To answer your second question, I can't use Luke because I can't install the JRE. But, I looked through the Fields collection of a SearchResult, and it includes everything, not just the fields I specified.
So are you getting items which are from doc types different from EcardAU in the above code? The config looks all fine to me (and all the sites we've implemented examine on understand the allowed node types properly).
One error I did notice in your search is what your Lucene query is:
Because the first statement starts with a '+' it means that that query must always match, and because of this the OR condition doesn't really do anything. What you want is NodeName().And().GroupedOr
The above filters out everything but ecards, and logically appends keywords within groups.
Now my biggest problem is that metaTags - which is a Tag Picker field - has it's values stored as a comma separated list by umbraco. In other words, the start of each word isn't being picked up because it's preceded by a comma instead of a space. Arg!
Any helpful tips you can give me to work around that would be appreciated :)
Edit: Nevermind, that site has an article that shows how to intercept and alter the values as they're being indexed. I can just replace the commas with spaces and bam! It should work. I'll let you know how it goes.
Great stuff. That technique is going to come in handy for a lot of search applications.
Just a follow up on the original topic. It turns out Examine was indexing correctly and observing IncludeNodeTypes exactly as it should. I had a look at the raw binary in the index set, and it's exactly as I wanted it - just ecards and only with the fields defined.
Instead, the problem seems to be related to binding between the searcher and the index. It's hard to tell, but it acts like all searchers get bound to the index of the defaultProvider defined in ExamineSettings.config.
I did a bunch of experimenting, and the results change to whatever index is associated with the defaultProvider. What's strangest of all is that, in the code itself, I interigated each UmbracoExamineSearcher, and they all knew exactly who they were (by name) and what IndexSet they were supposed to use, right down to the file path. But the (search) results speak for themselves.
Making EcardSearcher the defaultProvider gives me the exact results I want, switching defaultProvider to InternalSearcher gives more than I want and InternalMemberSearcher gives me none, as you'd expect.
So for now, this is what my ExamineSearchProviders section of ExamineSettings.config file looks like, and this works:
Thought I should update this topic, since I did end up figuring out what my problem was. That is, why my code always seemed to use whatever the default searcher was. It was the same problem that made me think IncludeNodeTypes was being ignored.
Hope it helps others who might be new to Examine.
My code looked something like this:
using EM = Examine.ExamineManager; ... String[] fields = new String[] { "nodeName", "metaTitle", "metaDescription", "metaKeywords", "metaTags" }; var searcher = EM.Instance.SearchProviderCollection["EcardSearcher"]; var criteria = searcher.CreateSearchCriteria().GroupedOr(new String[] { "nodeTypeAlias", "nodeTypeAlias" }, new String[] { Settings.Default.DataTypeEcardAU.ToLower(), Settings.Default.DataTypeEcardNZ.ToLower() }); foreach (String keyword in keywords.Split(new Char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)) { criteria = criteria.And().GroupedOr(fields, new WildcardValue(keyword)); } var results = EM.Instance.Search(criteria.Compile());
When the last line should have looked like this:
var results = searcher.Search(criteria.Compile());
I feel a bit dumb now, but for some reason I assumed the criteria would carry the index information, when in fact that's held by the searcher. So by using Instance.Search in the last line, I was really asking Examine to search the default index... duh!
P.S. Congrats on getting Examine 1.0 out the door :)
Examine Ignoring IncludeNodeTypes
I have the following index configured in ExamineIndex.config:
My understanding from the documentation is that indexes are supposed to be opt-in; that is to say, if you define at least one inclusion, it will ignore all others.
However, in all of my testing, search results return all document types. I've deleted and rebuilt the index a few times, so I know it's not stale data.
For now I'm working around this by adding search criteria based on the nodeTypeAlias field, but it's not a great solution.
P.S. This is a vanilla implementation, with no custom publishing events etc.
Are there any errors in the umbraco log? Does the index only contain the fields you specified?
Hi slace. Good questions.
OK, I changed runAsync to false, deleted the Ecard index folder, recycled the app and published one node. Here is the contents of the log:
To answer your second question, I can't use Luke because I can't install the JRE. But, I looked through the Fields collection of a SearchResult, and it includes everything, not just the fields I specified.
Here a piece of code:
Hope you can help.
Edit - and here are the relevant bits from ExamineSettings.config:
Not sure if this is helpful, but search.Compile().ToString() currently outputs this:
So are you getting items which are from doc types different from EcardAU in the above code? The config looks all fine to me (and all the sites we've implemented examine on understand the allowed node types properly).
One error I did notice in your search is what your Lucene query is:
Because the first statement starts with a '+' it means that that query must always match, and because of this the OR condition doesn't really do anything. What you want is NodeName().And().GroupedOr
Check this post I did on it: http://farmcode.org/post/2010/08/12/How-to-build-a-search-query-in-Examine.aspx
Thanks slace. I'm figuring out the query syntax as I go, so that article was very useful. The query now looks like this:
The above filters out everything but ecards, and logically appends keywords within groups.
Now my biggest problem is that metaTags - which is a Tag Picker field - has it's values stored as a comma separated list by umbraco. In other words, the start of each word isn't being picked up because it's preceded by a comma instead of a space. Arg!
Any helpful tips you can give me to work around that would be appreciated :)
Edit: Nevermind, that site has an article that shows how to intercept and alter the values as they're being indexed. I can just replace the commas with spaces and bam! It should work. I'll let you know how it goes.
Yep, doing a string replacement on the ',' with ' ' is the simplest solution for comma-separated values :)
Huzzah! That's done it, and the search works just the way I need it to.
Is it still indexing everything? Yep. Do I care? Hell no :P
Cheers mate.
I've just blogged about the comma-separated stuff for future reference: http://farmcode.org/post/2010/09/22/Searching-Multi-Node-Tree-Picker-data-(or-any-collection)-with-Examine.aspx
Great stuff. That technique is going to come in handy for a lot of search applications.
Just a follow up on the original topic. It turns out Examine was indexing correctly and observing IncludeNodeTypes exactly as it should. I had a look at the raw binary in the index set, and it's exactly as I wanted it - just ecards and only with the fields defined.
Instead, the problem seems to be related to binding between the searcher and the index. It's hard to tell, but it acts like all searchers get bound to the index of the defaultProvider defined in ExamineSettings.config.
I did a bunch of experimenting, and the results change to whatever index is associated with the defaultProvider. What's strangest of all is that, in the code itself, I interigated each UmbracoExamineSearcher, and they all knew exactly who they were (by name) and what IndexSet they were supposed to use, right down to the file path. But the (search) results speak for themselves.
Making EcardSearcher the defaultProvider gives me the exact results I want, switching defaultProvider to InternalSearcher gives more than I want and InternalMemberSearcher gives me none, as you'd expect.
So for now, this is what my ExamineSearchProviders section of ExamineSettings.config file looks like, and this works:
P.S. I'm using Examine build 57217 + umbraco 4.5.2 on Windows 2008 x64 + .NET Framework 4
Thought I should update this topic, since I did end up figuring out what my problem was. That is, why my code always seemed to use whatever the default searcher was. It was the same problem that made me think IncludeNodeTypes was being ignored.
Hope it helps others who might be new to Examine.
My code looked something like this:
When the last line should have looked like this:
I feel a bit dumb now, but for some reason I assumed the criteria would carry the index information, when in fact that's held by the searcher. So by using Instance.Search in the last line, I was really asking Examine to search the default index... duh!
P.S. Congrats on getting Examine 1.0 out the door :)
is working on a reply...