To be able to present our users with relevant search results, we use a mixture of elaborate techniques. (yeah, right)
The most important ones are filtering by path and composition types. Using the builtin indexer, we have to create a big graph of compositions to find the inheriting nodeTypeAliases, and we have to search path by wildcard.
Instead of doing that we just analyze the path and composition instead. So we have this custom analyzer that delegates to this "ISearchDataGatherer":
public class PathAndCompositionSearchDataGatherer : ISearchDataGatherer
{
private readonly Dictionary<int, IContentTypeComposition> contentTypes = new Dictionary<int, IContentTypeComposition>();
public void GatherNodeData(IndexingNodeDataEventArgs e)
{
var pathAtt = e.Node.Attribute("path");
var path = pathAtt != null ? pathAtt.Value : "";
e.Fields.Add("analyzedPath", path.Replace(',', ' '));
var contentTypeAtt = e.Node.Attribute("nodeType");
var contentTypeId = contentTypeAtt != null ? Convert.ToInt32(contentTypeAtt.Value) : -1;
var composition = "";
if (contentTypeId > -1)
{
IContentTypeComposition contentType = contentTypes.ContainsKey(contentTypeId)
? contentTypes[contentTypeId]
: null;
if (contentType == null)
{
contentType =
(IContentTypeComposition)
ApplicationContext.Current.Services.ContentTypeService.GetContentType(contentTypeId) ??
(IContentTypeComposition)
ApplicationContext.Current.Services.ContentTypeService.GetMediaType(contentTypeId);
contentTypes.Add(contentTypeId, contentType);
}
composition = String.Join(" ", contentType.CompositionAliases().Union(new[] {contentType.Alias}));
}
e.Fields.Add("analyzedComposition", composition);
}
}
Is this be something that would be worth pulling into the UmbracoContentIndexer?
I'll make the PR, just want to know if it's useful enough in the core.
The main issue I see with this is performance, this means that for every single node that will be indexed, for every indexer you are going to be querying the database - and really really really hoping that your going to get a cached result back.
To do this 'correctly' this data would be part of the main lookup when re-indexing a node or when rebuilding the index. The issue with that is Examine v1.0 limitation with the silly XML instance (but you know that you can work around that too by adding additional xml attributes if necessary). v2.0 will be much better suited because we can add any info we want up-front.
Analyzing path and composition worthy of core?
Hi guys,
To be able to present our users with relevant search results, we use a mixture of elaborate techniques. (yeah, right)
The most important ones are filtering by path and composition types. Using the builtin indexer, we have to create a big graph of compositions to find the inheriting nodeTypeAliases, and we have to search path by wildcard.
Instead of doing that we just analyze the path and composition instead. So we have this custom analyzer that delegates to this "ISearchDataGatherer":
Is this be something that would be worth pulling into the
UmbracoContentIndexer
?I'll make the PR, just want to know if it's useful enough in the core.
The main issue I see with this is performance, this means that for every single node that will be indexed, for every indexer you are going to be querying the database - and really really really hoping that your going to get a cached result back.
To do this 'correctly' this data would be part of the main lookup when re-indexing a node or when rebuilding the index. The issue with that is Examine v1.0 limitation with the silly XML instance (but you know that you can work around that too by adding additional xml attributes if necessary). v2.0 will be much better suited because we can add any info we want up-front.
Could possibly have hacked up some content type cache before any indexing, but I see the challenges involved with making this work well on any site.
But I'll happily wait with taking on this PR until 8.0/2.0. :)
is working on a reply...