lucene index and case

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

RoboDog 21 posts 41 karma points

Aug 12, 2010 @ 12:38

0

Lucene Index and case

Hi im having some issue with the lucene index. For example suppose i have a doctype called customer which in tern has a property called name, when i create a document and enter the name property as "John Doe" it is stored in the index as "john doe" how do i retain the correct casing ?

Copy Link

Aaron Powell 1708 posts 3046 karma points c-trib

Aug 12, 2010 @ 14:40

This answer was originally posted here: http://our.umbraco.org/forum/developers/extending-umbraco/10999-Examine-Questions?p=0#comment42951

You have to store both a case sensitive and case insensitive data as Lucene isn't really designed for data retrieval.

To do this with Examine you have to attach to the UmbracoExamine.LuceneExamineIndexer.DocumentWriting event (which may have moved into the LuceneEngine with the latest check ins, I'm not 100% sure).

This event is fired in a Lucene-scope as provides you with access to the Lucene Document object as it's being written to, and in which you'll need to add your un-analyzed version of the content.

Here's an example of how we did it in a recent project for showing in search results:

void indexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
{
        var doc = e.Document;
        
        // Find the title
        string title = !e.Fields.ContainsKey("PageTitle") || string.IsNullOrEmpty(e.Fields["PageTitle"]) ? e.Fields["nodeName"] : e.Fields["PageTitle"];
        
        // Default content is nothing:
        string content = string.Empty;
        // Unless a description is found:
        if (e.Fields.ContainsKey("Description") && !string.IsNullOrEmpty(e.Fields["Description"]))
        {
                content = e.Fields["Description"];
        }
        // Or BodyContent is found:
        else if (e.Fields.ContainsKey("BodyContent") && !string.IsNullOrEmpty(e.Fields["BodyContent"]))
        {
                content = e.Fields["BodyContent"];
        }

        // Store the title and content with text casing unchanged
        doc.Add(new Field("__PageTitle", title, Field.Store.YES, Field.Index.NOT_ANALYZED));
        doc.Add(new Field("__Content", content, Field.Store.YES, Field.Index.NOT_ANALYZED));
}

And when we display it in the search results we end up with showing the __PageTitle and __Content field, not the 'real' fields.

Check out this article I wrote to better understand the Store and Index concepts: www.aaron-powell.com/documents-in-lucene-net

Copy Link

RoboDog 21 posts 41 karma points

Aug 12, 2010 @ 15:10

0

Thats cool exactly what i was looking for :) one stupid question how / where do i attach to the event ? from within my code so that it executes when the index runs ?

Copy Link
Aaron Powell 1708 posts 3046 karma points c-trib

Aug 13, 2010 @ 01:06

0

You can use ApplicationBase like it's an Umbraco event, or you can use a HttpModule and wire it up early in the life cycle.

I'd go with ApplicationBase personally.

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Lucene Index and case