examine ignoring includenodetypes

Qube 74 posts 116 karma points

Sep 22, 2010 @ 03:56

Examine Ignoring IncludeNodeTypes

I have the following index configured in ExamineIndex.config:

<IndexSet SetName="EcardIndexSet" IndexPath="~/App_Data/ExamineIndexes/Ecard/">
    <IndexAttributeFields>
      <add Name="id" />
       <add Name="nodeName" />
     <add Name="nodeTypeAlias" />
    </IndexAttributeFields>
 <IndexUserFields>
       <add Name="metaTitle" />
        <add Name="metaDescription" />
      <add Name="metaKeywords" />
     <add Name="metaTags" />
 </IndexUserFields>
  <IncludeNodeTypes>
      <add Name="EcardAU" />
      <add Name="EcardNZ" />
  </IncludeNodeTypes>
 <ExcludeNodeTypes />
</IndexSet>

My understanding from the documentation is that indexes are supposed to be opt-in; that is to say, if you define at least one inclusion, it will ignore all others.

However, in all of my testing, search results return all document types. I've deleted and rebuilt the index a few times, so I know it's not stale data.

For now I'm working around this by adding search criteria based on the nodeTypeAlias field, but it's not a great solution.

P.S. This is a vanilla implementation, with no custom publishing events etc.

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Sep 22, 2010 @ 05:42

Are there any errors in the umbraco log? Does the index only contain the fields you specified?

Copy Link

Qube 74 posts 116 karma points

Sep 22, 2010 @ 06:21

Hi slace. Good questions.

OK, I changed runAsync to false, deleted the Ecard index folder, recycled the app and published one node. Here is the contents of the log:

id userId  NodeId  Datestamp   logHeader   logComment
32546   0   -1  2010-09-22 14:06:59.863 Error   [UmbracoExamine] Cannot index queue items, another indexing operation is currently in progress,
32545  2   -1  2010-09-22 14:06:52.613 Debug   Xml saved in 00:00:00.0146413
32544    2   2446    2010-09-22 14:06:52.543 Publish 
32543 0   -1  2010-09-22 14:06:49.860 Error   [UmbracoExamine] Cannot index queue items, another indexing operation is currently in progress,
32542  0   -1  2010-09-22 14:06:49.860 Error   [UmbracoExamine] Cannot index queue items, another indexing operation is currently in progress,
... SiteMapProvider errors
32531  2   2446    2010-09-22 14:06:39.910 Publish 
32530 2   2446    2010-09-22 14:06:38.567 Save    
32529 2   2446    2010-09-22 14:06:26.660 Open    
32528 0   -1  2010-09-22 14:06:24.960 System  Application started at 9/22/2010 2:06:24 PM

To answer your second question, I can't use Luke because I can't install the JRE. But, I looked through the Fields collection of a SearchResult, and it includes everything, not just the fields I specified.

Here a piece of code:

String keywords = Request.QueryString[Settings.Default.QueryStringKeywords];
if (!String.IsNullOrWhiteSpace(keywordQuery)) {
    ISearchCriteria search = EM.Instance.SearchProviderCollection["EcardSearcher"].CreateSearchCriteria().NodeName(keywords).Or().GroupedOr(new String[] { "metaTitle", "metaDescription", "metaKeywords", "metaTags" }, keywords).Compile();
 foreach (SearchResult item in EM.Instance.Search(search)) {
       foreach (var field in item.Fields) {
          Response.Write(field.Key + " = " + field.Value + "
");
       }
     break;
    };
}

Hope you can help.

Edit - and here are the relevant bits from ExamineSettings.config:

<Examine>
  <ExamineIndexProviders>
     <providers>
         <add name="EcardIndexer" type="UmbracoExamine.LuceneExamineIndexer, UmbracoExamine"
                runAsync="false"
              supportUnpublished="false"
                supportProtected="false"
              interval="10"
             analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
        </providers>
    </ExamineIndexProviders>
    <ExamineSearchProviders defaultProvider="InternalSearcher">
     <providers>
         <add name="EcardSearcher" type="UmbracoExamine.LuceneExamineSearcher, UmbracoExamine"
              analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
        </providers>
    </ExamineSearchProviders>
</Examine>

Copy Link

Qube 74 posts 116 karma points

Sep 22, 2010 @ 07:12

Not sure if this is helpful, but search.Compile().ToString() currently outputs this:

{ SearchIndexType: , LuceneQuery: +nodeTypeAlias:EcardAU (nodeName:a* 
metaTitle:a* metaDescription:a* metaKeywords:a* metaTags:a*) }

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Sep 22, 2010 @ 08:08

So are you getting items which are from doc types different from EcardAU in the above code? The config looks all fine to me (and all the sites we've implemented examine on understand the allowed node types properly).

One error I did notice in your search is what your Lucene query is:

+nodeTypeAlias:EcardAU (nodeName:a* metaTitle:a* metaDescription:a* metaKeywords:a* metaTags:a*)

Because the first statement starts with a '+' it means that that query must always match, and because of this the OR condition doesn't really do anything. What you want is NodeName().And().GroupedOr

Check this post I did on it: http://farmcode.org/post/2010/08/12/How-to-build-a-search-query-in-Examine.aspx

Copy Link

Qube 74 posts 116 karma points

Sep 22, 2010 @ 08:24

Thanks slace. I'm figuring out the query syntax as I go, so that article was very useful. The query now looks like this:

{ SearchIndexType: , LuceneQuery: +(nodeTypeAlias:ecardau nodeTypeAlias:ecardnz)
+(nodeName:a* metaTitle:a* metaDescription:a* metaKeywords:a* metaTags:a*)
+(nodeName:b* metaTitle:b* metaDescription:b* metaKeywords:b* metaTags:b*)
+(nodeName:c* metaTitle:c* metaDescription:c* metaKeywords:c* metaTags:c*) }

The above filters out everything but ecards, and logically appends keywords within groups.

Now my biggest problem is that metaTags - which is a Tag Picker field - has it's values stored as a comma separated list by umbraco. In other words, the start of each word isn't being picked up because it's preceded by a comma instead of a space. Arg!

Any helpful tips you can give me to work around that would be appreciated :)

Edit: Nevermind, that site has an article that shows how to intercept and alter the values as they're being indexed. I can just replace the commas with spaces and bam! It should work. I'll let you know how it goes.

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Sep 22, 2010 @ 08:29

Yep, doing a string replacement on the ',' with ' ' is the simplest solution for comma-separated values :)

Copy Link

Qube 74 posts 116 karma points

Sep 22, 2010 @ 08:45

Huzzah! That's done it, and the search works just the way I need it to.

Is it still indexing everything? Yep. Do I care? Hell no :P

Cheers mate.

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Sep 22, 2010 @ 09:11

I've just blogged about the comma-separated stuff for future reference: http://farmcode.org/post/2010/09/22/Searching-Multi-Node-Tree-Picker-data-(or-any-collection)-with-Examine.aspx

Copy Link

Qube 74 posts 116 karma points

Sep 23, 2010 @ 06:01

Great stuff. That technique is going to come in handy for a lot of search applications.

Just a follow up on the original topic. It turns out Examine was indexing correctly and observing IncludeNodeTypes exactly as it should. I had a look at the raw binary in the index set, and it's exactly as I wanted it - just ecards and only with the fields defined.

Instead, the problem seems to be related to binding between the searcher and the index. It's hard to tell, but it acts like all searchers get bound to the index of the defaultProvider defined in ExamineSettings.config.

I did a bunch of experimenting, and the results change to whatever index is associated with the defaultProvider. What's strangest of all is that, in the code itself, I interigated each UmbracoExamineSearcher, and they all knew exactly who they were (by name) and what IndexSet they were supposed to use, right down to the file path. But the (search) results speak for themselves.

Making EcardSearcher the defaultProvider gives me the exact results I want, switching defaultProvider to InternalSearcher gives more than I want and InternalMemberSearcher gives me none, as you'd expect.

So for now, this is what my ExamineSearchProviders section of ExamineSettings.config file looks like, and this works:

   <ExamineSearchProviders defaultProvider="EcardSearcher">
        <providers>
         <add name="EcardSearcher" type="UmbracoExamine.LuceneExamineSearcher, UmbracoExamine"
              analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
            <add name="InternalSearcher" type="UmbracoExamine.LuceneExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
            <add name="InternalMemberSearcher" type="UmbracoExamine.LuceneExamineSearcher, UmbracoExamine"
             analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcards="true"/>
       </providers>
    </ExamineSearchProviders>

P.S. I'm using Examine build 57217 + umbraco 4.5.2 on Windows 2008 x64 + .NET Framework 4

Copy Link

Qube 74 posts 116 karma points

Oct 18, 2010 @ 04:21

Thought I should update this topic, since I did end up figuring out what my problem was. That is, why my code always seemed to use whatever the default searcher was. It was the same problem that made me think IncludeNodeTypes was being ignored.

Hope it helps others who might be new to Examine.

My code looked something like this:

using EM = Examine.ExamineManager;
...
String[] fields = new String[] { "nodeName", "metaTitle", "metaDescription", "metaKeywords", "metaTags" };
var searcher = EM.Instance.SearchProviderCollection["EcardSearcher"];
var criteria = searcher.CreateSearchCriteria().GroupedOr(new String[] { "nodeTypeAlias", "nodeTypeAlias" }, new String[] { Settings.Default.DataTypeEcardAU.ToLower(), Settings.Default.DataTypeEcardNZ.ToLower() });
foreach (String keyword in keywords.Split(new Char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)) {
    criteria = criteria.And().GroupedOr(fields, new WildcardValue(keyword));
}
var results = EM.Instance.Search(criteria.Compile());

When the last line should have looked like this:

var results = searcher.Search(criteria.Compile());

I feel a bit dumb now, but for some reason I assumed the criteria would carry the index information, when in fact that's held by the searcher. So by using Instance.Search in the last line, I was really asking Examine to search the default index... duh!

P.S. Congrats on getting Examine 1.0 out the door :)

Copy Link

is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Flag this post as spam?

Examine Ignoring IncludeNodeTypes