Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Alex Wilks 27 posts 56 karma points
    Nov 26, 2012 @ 17:50
    Alex Wilks
    0

    Examine with Multi-node picker

    Hi guys,

    I have an interesting search requirement which I'm not sure how to tackle. Basically, I have a large amount of data (lets say books, for example) and each book could must be written by at least one author, possibly more. Both books and authors are represented as nodes and authors are associated with a book via a multi-node picker which stores the values as a CSV (it must be csv and not xml because of the import method...). The site will allow you to see all books written by (or co-written by) an author and this currently happens by looking at all book nodes where the author field contains the id of the author. This is painfully slow so we were hoping to use an examine search to speed this along. However, although my index is good (checked in Luke) I'm not seeing any results. Below is the config I'm using - hopefully one of the gurus here can help me solve this.

    <IndexSet SetName="BookFilterIndexSet"
        IndexPath="~/App_Data/TEMP/ExamineIndexes/BookFilter/"
        IndexParentId="-1">
        <IndexAttributeFields>
            <add Name="id" EnableSorting="true" Type="Number" />
            <add Name="nodeName" EnableSorting="true" />
        </IndexAttributeFields>
        <IndexUserFields>
            <add Name="title" />
            <add Name="authors" />
        </IndexUserFields>
        <IncludeNodeTypes>
            <add Name="BookItem" />
        </IncludeNodeTypes>
        <ExcludeNodeTypes />
    </IndexSet>

    "authors" is the property that holds the csv of authors.

    <Examine>
      <ExamineIndexProviders>
        <providers>
          <add name="AuthorFilterIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
                 indexSet="AuthorFilterIndexSet"
                 supportUnpublished="false"
                 supportProtected="false"
                 analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"
                 enableDefaultEventHandler="true"/>
        </providers>
      </ExamineIndexProviders>

      <ExamineSearchProviders defaultProvider="InternalSearcher">
        <providers>
          <add name="AuthorFilterSearcher"
                 type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
                 indexSet="AuthorFilterIndexSet"
                 analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
        </providers>
      </ExamineSearchProviders>
    </Examine>

    And below is the code I use to call the search:

    @using umbraco.MacroEngines;
    @using Examine;
    @using Examine.SearchCriteria;
    @{
        string searchTerm = umbraco.NodeFactory.Node.getCurrentNodeId().ToString();
        var Searcher = ExamineManager.Instance.SearchProviderCollection["AuthorFilterSearcher"];
        var searchCriteria = Searcher.CreateSearchCriteria(BooleanOperation.Or);
        var luceneString = "";
    
        luceneString += "authors:";
        luceneString += "(+" + searchTerm.Replace(" ", " +") + ")^5 ";
        luceneString += "authors:" + searchTerm;
    Response.Write("Lucene string:<br />" + luceneString + "<br />"); var query = searchCriteria.RawQuery(luceneString); var searchResults = Searcher.Search(query).OrderByDescending(x => x.Score); Response.Write("Result count: " + searchResults.Count().ToString()); }

    For the above, I'd tested this on a page with ID 8518 (there should be two results) and this was the output:

    Lucene string:
    authors:(+8518)^5 authors:8518
    Result count: 0

    In Luke, I can get both results using the following string but it doesn't work if there are multiple authors and the author ID you're searching for is the second author, not the first:

    authors:(+8518)^5 authors:8518*

    Can anyone help?

    Thanks for looking

    Alex

  • Morten Bock 1867 posts 2140 karma points MVP 2x admin c-trib
    Nov 26, 2012 @ 19:35
    Morten Bock
    1

    I think one of the problems is that the list is comma separated and not space separated. I think I see two options.

    1: Allow leading wildcards. There is a way to allow the query to use leading wildcard, so you could do a search string like "author:*12345*", but that would also match a partial id, for example 123456.

    2: Change the indexed data: There is a way to hook into an event, where you can add data to the index as a nod eis being indexed. So you could either add a new field, or convert the existing one, so you transform "123,234,345" into "123 234 345". That would make it possible to search for "author:+123" with no wildcards, and get the right results.

  • Alex Wilks 27 posts 56 karma points
    Nov 26, 2012 @ 19:41
    Alex Wilks
    0

    About 15 minutes ago I tried your number 2 solution and it appeared to work. Since then, I've been re-publishing thousands of items to hit my new "on publish" event so a label is populated with the correct IDs and I was just in the middle of composing a reply saying I'd solved it when I had an email informing me of an update to my post!

    Thanks so much for getting back to me Morten.

Please Sign in or register to post replies

Write your reply to:

Draft