Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • trfletch 598 posts 604 karma points
    May 28, 2013 @ 18:17
    trfletch
    0

    Querying against a comma separated list of IDs with examine and Lucene.net

    Hi,

    I have an umbraco 4.11 website and I am trying to build a custom Examine (lucene) index. I have a property on the nodes that is a comma separated list of category Id's that I want to add to the index. I am pretty much trying to do exactly what is explained in the following article:

    http://stackoverflow.com/questions/5124183/querying-against-a-comma-separated-list-of-ids-with-examine-and-lucene-net ;

    The problem I have is when I am trying to create the index using the following code:

                // Loop through articles
                foreach (var a in articles)
                {
                    yield return new SimpleDataSet()
                    {
                        NodeDefinition = new Examine.IndexedNode()
                        {
                            NodeId = a.Id,
                            Type = "Article"
                            
                        },
                        RowData = new Dictionary<stringstring>()
                        {
                            {"Name", a.Name},
                            {"Url", a.NiceUrl},
                            {"Category""1234"},
                            {"Category""5678"}
                        }
                    };
                }

    I received the following error:

    An item with the same key has already been added.

    Does any know how I can get multiple categories for each article into my Examine index?

    Regards Tony

  • Aaron 14 posts 34 karma points
    Nov 13, 2013 @ 23:39
    Aaron
    0

    Hey Tony,

    I am sure you have solved this by now, but came across this post while I was working on a similar problem, thought I would share my findings.

    You can access the a Lucene document in the OnDocumentWriting overload when you create a custom indexer from BaseUmbracoIndexer

    protected override void OnDocumentWriting(DocumentWritingEventArgs docArgs)
    {
        var currentNode = _nodeFactoryFacade.GetNode(docArgs.NodeId);
    
        var categories = currentNode.GetProperty("categories").Value;
        if (!string.IsNullOrEmpty(categories))
        {
            var categoryNodeIdsXml = XElement.Parse(categories);
            var categoryNodeIds = categoryNodeIdsXml.Descendants("nodeId");
            foreach (var categoryNodeId in categoryNodeIds)
            {
                docArgs.Document.Add(new Field("categories", categoryNodeId.Value, Field.Store.YES, Field.Index.ANALYZED));
            }
        }
    
    
        base.OnDocumentWriting(docArgs);
    }
    

    However the issue is now with searching. When you search you get the same duplicate key exception. I worked around this by concatenating my values, separating them by a pipe (|).

    Blogged about my findings if you want any more detail.

    http://blog.gravypower.net/examine-indexing-and-searching-with-multypart-properties/

    Hope that helps

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Nov 14, 2013 @ 08:51
    Ismail Mayat
    0

    Guys,

    You can inject in into categories space separated list and that can then searched. I do something similar when i want todo path based queries eg all nodes that have parent 1234. Path is stored as csv and not tokenised so i create new field using gatheringnode data then inject in space separated path all works nicely. BTW I did a whole series of blog posts here on examiness over 10 posts covering different hints and tips.

    Regards

    Ismail

  • Matt 76 posts 280 karma points
    Nov 15, 2013 @ 08:12
    Matt
    0

    Ismail - I'm using your GatheringNodeData technique to inject the data into the index replacing "," with ", ", however my issue is with the incorrect results being returned.

    Given this data:
    Blog Entry 1 has categories (array of tags) set to "symptom, something else" 
    Blog Entry 2 has categories set to "another symptom, something else"
    Blog Entry 3 has categories set to "symptom, something else" 

    When searching for:
    something else - get 3 results back - Correct 
    another symptom - get 1 result back - Correct
    symptom - get 3 results back - InCorrect should be only 2 results

    I know what is happening and kind of why it is happening but dont know how to fix it - any ideas?  Can I separate with pipes and then include those in the search when I pass in the categories?

    Regards,
    Matt

  • Aaron 14 posts 34 karma points
    Nov 15, 2013 @ 09:14
    Aaron
    0

    Hey Matt,

    I would say the issue is that Lucene is doing a text search and is picking up the word "symptom" out of "another symptom" category. I have been adding the id of the node the represents a category this way the name does not affect the search.

    There could be an issue when a node ID contains the ID of another category, I would have to have a bit more of a look, but that would solve your issue. Have a look at my blog post above to see what I ended up doing when dealing with categories.

    Hope that helps.

    Aaron

  • Matt 76 posts 280 karma points
    Nov 15, 2013 @ 09:20
    Matt
    0

    Thanks Aaron - I we you a beer if you're ever in Seattle :-)

    Nice simple solution and also solved my other issues of categories having stop words in them.

    Regards,
    Matt 

  • Aaron 14 posts 34 karma points
    Nov 16, 2013 @ 01:02
    Aaron
    0

    No worries Matt, glad I can help. I find that when I put things into writing I understand them more.

    Aaron

Please Sign in or register to post replies

Write your reply to:

Draft