Querying against a comma separated list of IDs with examine and Lucene.net
Hi,
I have an umbraco 4.11 website and I am trying to build a custom Examine (lucene) index. I have a property on the nodes that is a comma separated list of category Id's that I want to add to the index. I am pretty much trying to do exactly what is explained in the following article:
I am sure you have solved this by now, but came across this post while I was working on a similar problem, thought I would share my findings.
You can access the a Lucene document in the OnDocumentWriting overload when you create a custom indexer from BaseUmbracoIndexer
protected override void OnDocumentWriting(DocumentWritingEventArgs docArgs)
{
var currentNode = _nodeFactoryFacade.GetNode(docArgs.NodeId);
var categories = currentNode.GetProperty("categories").Value;
if (!string.IsNullOrEmpty(categories))
{
var categoryNodeIdsXml = XElement.Parse(categories);
var categoryNodeIds = categoryNodeIdsXml.Descendants("nodeId");
foreach (var categoryNodeId in categoryNodeIds)
{
docArgs.Document.Add(new Field("categories", categoryNodeId.Value, Field.Store.YES, Field.Index.ANALYZED));
}
}
base.OnDocumentWriting(docArgs);
}
However the issue is now with searching. When you search you get the same duplicate key exception. I worked around this by concatenating my values, separating them by a pipe (|).
Blogged about my findings if you want any more detail.
You can inject in into categories space separated list and that can then searched. I do something similar when i want todo path based queries eg all nodes that have parent 1234. Path is stored as csv and not tokenised so i create new field using gatheringnode data then inject in space separated path all works nicely. BTW I did a whole series of blog posts here on examiness over 10 posts covering different hints and tips.
Ismail - I'm using your GatheringNodeData technique to inject the data into the index replacing "," with ", ", however my issue is with the incorrect results being returned.
Given this data: Blog Entry 1 has categories (array of tags) set to "symptom, something else" Blog Entry 2 has categories set to "another symptom, something else" Blog Entry 3 has categories set to "symptom, something else"
When searching for: something else - get 3 results back - Correct another symptom - get 1 result back - Correct symptom - get 3 results back - InCorrect should be only 2 results
I know what is happening and kind of why it is happening but dont know how to fix it - any ideas? Can I separate with pipes and then include those in the search when I pass in the categories?
I would say the issue is that Lucene is doing a text search and is picking up the word "symptom" out of "another symptom" category. I have been adding the id of the node the represents a category this way the name does not affect the search.
There could be an issue when a node ID contains the ID of another category, I would have to have a bit more of a look, but that would solve your issue. Have a look at my blog post above to see what I ended up doing when dealing with categories.
Querying against a comma separated list of IDs with examine and Lucene.net
Hi,
I have an umbraco 4.11 website and I am trying to build a custom Examine (lucene) index. I have a property on the nodes that is a comma separated list of category Id's that I want to add to the index. I am pretty much trying to do exactly what is explained in the following article:
http://stackoverflow.com/questions/5124183/querying-against-a-comma-separated-list-of-ids-with-examine-and-lucene-net ;
The problem I have is when I am trying to create the index using the following code:
Hey Tony,
I am sure you have solved this by now, but came across this post while I was working on a similar problem, thought I would share my findings.
You can access the a Lucene document in the OnDocumentWriting overload when you create a custom indexer from BaseUmbracoIndexer
However the issue is now with searching. When you search you get the same duplicate key exception. I worked around this by concatenating my values, separating them by a pipe (|).
Blogged about my findings if you want any more detail.
http://blog.gravypower.net/examine-indexing-and-searching-with-multypart-properties/
Hope that helps
Guys,
You can inject in into categories space separated list and that can then searched. I do something similar when i want todo path based queries eg all nodes that have parent 1234. Path is stored as csv and not tokenised so i create new field using gatheringnode data then inject in space separated path all works nicely. BTW I did a whole series of blog posts here on examiness over 10 posts covering different hints and tips.
Regards
Ismail
Ismail - I'm using your GatheringNodeData technique to inject the data into the index replacing "," with ", ", however my issue is with the incorrect results being returned.
Given this data:
Blog Entry 1 has categories (array of tags) set to "symptom, something else"
Blog Entry 2 has categories set to "another symptom, something else"
Blog Entry 3 has categories set to "symptom, something else"
When searching for:
something else - get 3 results back - Correct
another symptom - get 1 result back - Correct
symptom - get 3 results back - InCorrect should be only 2 results
I know what is happening and kind of why it is happening but dont know how to fix it - any ideas? Can I separate with pipes and then include those in the search when I pass in the categories?
Regards,
Matt
Hey Matt,
I would say the issue is that Lucene is doing a text search and is picking up the word "symptom" out of "another symptom" category. I have been adding the id of the node the represents a category this way the name does not affect the search.
There could be an issue when a node ID contains the ID of another category, I would have to have a bit more of a look, but that would solve your issue. Have a look at my blog post above to see what I ended up doing when dealing with categories.
Hope that helps.
Aaron
Thanks Aaron - I we you a beer if you're ever in Seattle :-)
Nice simple solution and also solved my other issues of categories having stop words in them.
Regards,
Matt
No worries Matt, glad I can help. I find that when I put things into writing I understand them more.
Aaron
is working on a reply...