Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • RoboDog 21 posts 41 karma points
    Aug 12, 2010 @ 12:38
    RoboDog
    0

    Lucene Index and case

    Hi im having some issue with the lucene index. For example suppose i have a doctype called customer which in tern has a property called name, when i create a document and enter the name property as "John Doe" it is stored in the index as "john doe" how do i retain the correct casing ?

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Aug 12, 2010 @ 14:40
    Aaron Powell
    0

    This answer was originally posted here: http://our.umbraco.org/forum/developers/extending-umbraco/10999-Examine-Questions?p=0#comment42951

    You have to store both a case sensitive and case insensitive data as Lucene isn't really designed for data retrieval.

    To do this with Examine you have to attach to the UmbracoExamine.LuceneExamineIndexer.DocumentWriting event (which may have moved into the LuceneEngine with the latest check ins, I'm not 100% sure).

    This event is fired in a Lucene-scope as provides you with access to the Lucene Document object as it's being written to, and in which you'll need to add your un-analyzed version of the content.

    Here's an example of how we did it in a recent project for showing in search results:

    void indexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
    {
           
    var doc = e.Document;
           
           
    // Find the title
           
    string title = !e.Fields.ContainsKey("PageTitle") || string.IsNullOrEmpty(e.Fields["PageTitle"]) ? e.Fields["nodeName"] : e.Fields["PageTitle"];
           
           
    // Default content is nothing:
           
    string content = string.Empty;
           
    // Unless a description is found:
           
    if (e.Fields.ContainsKey("Description") && !string.IsNullOrEmpty(e.Fields["Description"]))
           
    {
                    content
    = e.Fields["Description"];
           
    }
           
    // Or BodyContent is found:
           
    else if (e.Fields.ContainsKey("BodyContent") && !string.IsNullOrEmpty(e.Fields["BodyContent"]))
           
    {
                    content
    = e.Fields["BodyContent"];
           
    }

           
    // Store the title and content with text casing unchanged
            doc
    .Add(new Field("__PageTitle", title, Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc
    .Add(new Field("__Content", content, Field.Store.YES, Field.Index.NOT_ANALYZED));
    }
    And when we display it in the search results we end up with showing the __PageTitle and __Content field, not the 'real' fields.
    Check out this article I wrote to better understand the Store and Index concepts: www.aaron-powell.com/documents-in-lucene-net

  • RoboDog 21 posts 41 karma points
    Aug 12, 2010 @ 15:10
    RoboDog
    0

    Thats cool exactly what i was looking for :) one stupid question how / where do i attach to the event ? from within my code so that it executes when the index runs ?

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Aug 13, 2010 @ 01:06
    Aaron Powell
    0

    You can use ApplicationBase like it's an Umbraco event, or you can use a HttpModule and wire it up early in the life cycle.

    I'd go with ApplicationBase personally.

Please Sign in or register to post replies

Write your reply to:

Draft