Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Bo Jacobsen 219 posts 1055 karma points
    Apr 11, 2019 @ 09:28
    Bo Jacobsen
    0

    How to create a custom Examine index with specified fields to include in Umbraco 8

    Hi all.

    Using Umbraco 8.0.1

    How can i specify which fields to include, so the index do not automatically take all fields from the defined document aliases into the index?

    The Umbraco.Examine.ContentValueSetValidator always sets IncludeFields and ExcludeFields to null. And when i define my own ContentValueSetValidator, it does not care about the fields i include in the IncludeFields array. https://github.com/umbraco/Umbraco-CMS/blob/v8/dev/src/Umbraco.Examine/ContentValueSetValidator.cs

    The Umbraco.Examine.UmbracoFieldDefinitionCollection seems to add the fields, but when i define my own it breaks. https://github.com/umbraco/Umbraco-CMS/blob/v8/dev/src/Umbraco.Examine/UmbracoFieldDefinitionCollection.cs

    public class ContentSearchIndexCreator : LuceneIndexCreator, IUmbracoIndexesCreator
    {
        private readonly IProfilingLogger _profilingLogger;
        private readonly ILocalizationService _languageService;
    
        public ContentSearchIndexCreator(IProfilingLogger profilingLogger, ILocalizationService languageService)
        {
            _profilingLogger = profilingLogger;
            _languageService = languageService;
        }
    
        public override IEnumerable<IIndex> Create()
        {
            return new[]
            {
                    CreateContentIndex(
                        "ContentSearchIndex",
                        "ContentSearch",
                        new UmbracoFieldDefinitionCollection(),
                        new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30),
                        new ContentValueSetValidator(true, true, null, null, new string[] { "TextPage", "NumberPage" }, null)
                     )
                };
        }
    
        private IIndex CreateContentIndex(
            string name,
            string folderName,
            FieldDefinitionCollection fieldDefinitionCollection,
            Lucene.Net.Analysis.Analyzer luceneAnalyzer,
            IContentValueSetValidator contentValueSetValidator)
        {
            var index = new UmbracoContentIndex(
            name,
            CreateFileSystemLuceneDirectory(folderName),
            fieldDefinitionCollection,
            luceneAnalyzer,
            _profilingLogger,
            _languageService,
            contentValueSetValidator);
    
            return index;
        }
    }
    
  • Corné Strijkert 61 posts 398 karma points
    Apr 11, 2019 @ 11:42
    Corné Strijkert
    0

    Hi Bo,

    I did some quick investigation and maybe the following helps you some bit in the right direction.

    When you implement your own ContentValueSetValidator you are able to exclude fields from being indexed in the Validate(ValueSet valueSet) method.

    With valueSet.Values.Remove(key) you can remove values from the valueset.

    I think the UmbracoFieldDefinitionCollection maybe doesn't determine which fields are really included in the index. It is more a mapping between Umbraco fields and Examine field types. The comment above this code says:

    A type that defines the type of index for each Umbraco field (non user defined fields)

    https://github.com/umbraco/Umbraco-CMS/blob/853087a75044b814df458457dc9a1f778cc89749/src/Umbraco.Examine/UmbracoFieldDefinitionCollection.cs

    To be continued..

  • Bo Jacobsen 219 posts 1055 karma points
    Apr 12, 2019 @ 08:35
    Bo Jacobsen
    0

    Hi Corné

    I got it working by making my own ContentValueSetValidator, but i dunno if i am happy with the way to do it.

    public class ContentValueSetValidator : ValueSetValidator, IContentValueSetValidator
    {
        private readonly IPublicAccessService _publicAccessService;
    
        private const string PathKey = "path";
        private static readonly IEnumerable<string> ValidCategories = new[] { IndexTypes.Content, IndexTypes.Media };
        protected override IEnumerable<string> ValidIndexCategories => ValidCategories;
    
        public bool PublishedValuesOnly { get; }
        public bool SupportProtectedContent { get; }
        public int? ParentId { get; }
    
    
        public ContentValueSetValidator(bool publishedValuesOnly, int? parentId = null, IEnumerable<string> includeItemTypes = null, IEnumerable<string> excludeItemTypes = null, IEnumerable<string> includeFields = null, IEnumerable<string> excludeFields = null)
            : this(publishedValuesOnly, true, null, parentId, includeItemTypes, excludeItemTypes, includeFields, excludeFields)
        {
        }
    
        public ContentValueSetValidator(bool publishedValuesOnly, bool supportProtectedContent, IPublicAccessService publicAccessService, int? parentId = null, IEnumerable<string> includeItemTypes = null, IEnumerable<string> excludeItemTypes = null, IEnumerable<string> includeFields = null, IEnumerable<string> excludeFields = null)
            : base(includeItemTypes, excludeItemTypes, includeFields, excludeFields)
        {
            PublishedValuesOnly = publishedValuesOnly;
            SupportProtectedContent = supportProtectedContent;
            ParentId = parentId;
            _publicAccessService = publicAccessService;
        }
    
    
        public bool ValidatePath(string path, string category)
        {
            //check if this document is a descendent of the parent
            if (ParentId.HasValue && ParentId.Value > 0)
            {
                // we cannot return FAILED here because we need the value set to get into the indexer and then deal with it from there
                // because we need to remove anything that doesn't pass by parent Id in the cases that umbraco data is moved to an illegal parent.
                if (!path.Contains(string.Concat(",", ParentId.Value, ",")))
                    return false;
            }
    
            return true;
        }
    
        public bool ValidateRecycleBin(string path, string category)
        {
            var recycleBinId = category == IndexTypes.Content ? Constants.System.RecycleBinContent : Constants.System.RecycleBinMedia;
    
            //check for recycle bin
            if (PublishedValuesOnly)
            {
                if (path.Contains(string.Concat(",", recycleBinId, ",")))
                    return false;
            }
            return true;
        }
    
        public bool ValidateProtectedContent(string path, string category)
        {
            if (category == IndexTypes.Content
                && !SupportProtectedContent
                // if the service is null we can't look this up so we'll return false
                && (_publicAccessService == null || _publicAccessService.IsProtected(path)))
            {
                return false;
            }
    
            return true;
        }
    
        public override ValueSetValidationResult Validate(ValueSet valueSet)
        {
            // Removed base.Validate(valueSet) in order to manipulate the valueSet.Values the way we want to.
    
            if (ValidIndexCategories != null && !ValidIndexCategories.InvariantContains(valueSet.Category))
            {
                return ValueSetValidationResult.Failed;
            }
    
            // check if this document is of a correct type of node type alias
            if (IncludeItemTypes != null && !IncludeItemTypes.InvariantContains(valueSet.ItemType))
            {
                return ValueSetValidationResult.Failed;
            }
    
            // if this node type is part of our exclusion list
            if (ExcludeItemTypes != null && ExcludeItemTypes.InvariantContains(valueSet.ItemType))
            {
                return ValueSetValidationResult.Failed;
            }
    
            ValueSetValidationResult baseValidateResult = ValueSetValidationResult.Valid;
    
            // Checking IncludeFields and ExcludeFields for exact key name or culture name.
            foreach (var key in valueSet.Values.Keys.ToList())
            {
                if (IncludeFields != null && !IncludeFields.Any(x => x.Equals(key) || key.StartsWith($"{x}_")))
                {
                    valueSet.Values.Remove(key); //remove any value with a key that doesn't match the inclusion list
                    baseValidateResult = ValueSetValidationResult.Filtered;
                }
    
                if (ExcludeFields != null && ExcludeFields.Any(x => x.Equals(key) || key.StartsWith($"{x}_")))
                {
                    valueSet.Values.Remove(key); //remove any value with a key that matches the exclusion list
                    baseValidateResult = ValueSetValidationResult.Filtered;
                }
            }
    
            var isFiltered = baseValidateResult == ValueSetValidationResult.Filtered;
    
            //check for published content
            if (valueSet.Category == IndexTypes.Content && PublishedValuesOnly)
            {
                if (!valueSet.Values.TryGetValue(UmbracoExamineIndex.PublishedFieldName, out var published))
                    return ValueSetValidationResult.Failed;
    
                if (!published[0].Equals("y"))
                    return ValueSetValidationResult.Failed;
    
    
                //deal with variants, if there are unpublished variants than we need to remove them from the value set
                if (valueSet.Values.TryGetValue(UmbracoContentIndex.VariesByCultureFieldName, out var variesByCulture)
                    && variesByCulture.Count > 0 && variesByCulture[0].Equals("y"))
                {
                    //so this valueset is for a content that varies by culture, now check for non-published cultures and remove those values
                    foreach (var publishField in valueSet.Values.Where(x => x.Key.StartsWith($"{UmbracoExamineIndex.PublishedFieldName}_")).ToList())
                    {
                        if (publishField.Value.Count <= 0 || !publishField.Value[0].Equals("y"))
                        {
                            //this culture is not published, so remove all of these culture values
                            var cultureSuffix = publishField.Key.Substring(publishField.Key.LastIndexOf('_'));
                            foreach (var cultureField in valueSet.Values.Where(x => x.Key.InvariantEndsWith(cultureSuffix)).ToList())
                            {
                                valueSet.Values.Remove(cultureField.Key);
                                isFiltered = true;
                            }
                        }
                    }
                }
            }
    
            //must have a 'path'
            if (!valueSet.Values.TryGetValue(PathKey, out var pathValues)) return ValueSetValidationResult.Failed;
            if (pathValues.Count == 0) return ValueSetValidationResult.Failed;
            if (pathValues[0] == null) return ValueSetValidationResult.Failed;
            if (pathValues[0].ToString().IsNullOrWhiteSpace()) return ValueSetValidationResult.Failed;
            var path = pathValues[0].ToString();
    
            // We need to validate the path of the content based on ParentId, protected content and recycle bin rules.
            // We cannot return FAILED here because we need the value set to get into the indexer and then deal with it from there
            // because we need to remove anything that doesn't pass by protected content in the cases that umbraco data is moved to an illegal parent.
            if (!ValidatePath(path, valueSet.Category)
                || !ValidateRecycleBin(path, valueSet.Category)
                || !ValidateProtectedContent(path, valueSet.Category))
                return ValueSetValidationResult.Filtered;
    
            return isFiltered ? ValueSetValidationResult.Filtered : ValueSetValidationResult.Valid;
        }
    }
    

    Then i use it in a custom LuceneIndexCreator.

    public class ContentSearchIndexCreator : LuceneIndexCreator, IUmbracoIndexesCreator
    {
        private readonly IProfilingLogger _profilingLogger;
        private readonly ILocalizationService _languageService;
    
        public ContentSearchIndexCreator(IProfilingLogger profilingLogger, ILocalizationService languageService)
        {
            _profilingLogger = profilingLogger;
            _languageService = languageService;
        }
    
        public override IEnumerable<IIndex> Create()
        {
            return new[]
            {
                    CreateContentIndex(
                        "ContentSearchIndex",
                        "ContentSearch",
                        new UmbracoFieldDefinitionCollection(),
                        new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30),
                        new ValueSetValidators.ContentValueSetValidator(true, true, null, null, new string[] { "TextPage", "RedirectNode" }, null, new string[] { "__IndexType", "__Published", "__Key", "__Path", "__VariesByCulture", "__NodeId", "id", "path", "nodeName", "pageGrid", "searchTags", "searchablePath" }, null)
                     )
                };
        }
    
        private IIndex CreateContentIndex(
            string name,
            string folderName,
            FieldDefinitionCollection fieldDefinitionCollection,
            Lucene.Net.Analysis.Analyzer luceneAnalyzer,
            IContentValueSetValidator contentValueSetValidator)
        {
            var index = new UmbracoContentIndex(
            name,
            CreateFileSystemLuceneDirectory(folderName),
            fieldDefinitionCollection,
            luceneAnalyzer,
            _profilingLogger,
            _languageService,
            contentValueSetValidator);
    
            return index;
        }
    }
    

    As bonus info i added the searchablePath as and IComponent

    public class ExamineLuceneComponent : IComponent
    {
        private readonly IExamineManager _examineManager;
        private readonly ILogger _logger;
    
        public ExamineLuceneComponent(IExamineManager examineManager, ILogger logger)
        {
            _logger = logger;
            _examineManager = examineManager;
        }
    
        public void Initialize()
        {
            var externalIndex = _examineManager.Indexes.FirstOrDefault(x => x.Name == "ContentSearchIndex");
            if (externalIndex != null)
            {
                ((BaseIndexProvider)externalIndex).TransformingIndexValues += ExamineLuceneComponent_TransformingIndexValues;
            }
        }
    
        private void ExamineLuceneComponent_TransformingIndexValues(object sender, IndexingItemEventArgs e)
        {
            if (e.ValueSet.Category == IndexTypes.Content)
            {
                try
                {
                    var value = e.ValueSet.Values.Where(x => x.Key == "path").Select(x => x.Value).FirstOrDefault();
                    if (value != null && value.Any())
                    {
                        var list = new List<object>();
                        var path = value.First().ToString().Replace(",", " ");
                        list.Add(path);
    
                        var searchablePath = e.ValueSet.Values.FirstOrDefault(x => x.Key == "searchablePath");
                        if (searchablePath.Key != null)
                        {
                            searchablePath.Value.Clear();
                            searchablePath.Value.Add(list);
                        }
                        else
                        {
                            e.ValueSet.Values.Add("searchablePath", list);
                        }
                    }
                }
                catch (Exception ex)
                {
                    _logger.Error<Exception>("error munging fields for " + e.ValueSet.Id, ex);
                }
            }
        }
    
        public void Terminate() { }
    }
    

    Then it gives these results:

    enter image description here

    enter image description here

    enter image description here

    enter image description here

    Next step is to figure out how to include PDF and WORD files.

  • Jo Kendal 24 posts 166 karma points
    27 days ago
    Jo Kendal
    0

    Hi

    I have just arrived at the requirement for search on a new build in U8.

    This is all very different from U7!

    I have got the regular indexes working - did you make any progress on Word/PDF indexing? I can't find anything out there presently.

  • Bo Jacobsen 219 posts 1055 karma points
    6 days ago
    Bo Jacobsen
    0

    Hi Jo Kendal.

    No luck with the file indexing yet.

  • Jo Kendal 24 posts 166 karma points
    6 days ago
    Jo Kendal
    0

    Hi

    I did get it working. Sorry. I'm pretty snowed under just now but I intend to post when I can.

Please Sign in or register to post replies

Write your reply to:

Draft