Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Alexandre Locas 25 posts 116 karma points
    Aug 19, 2019 @ 15:28
    Alexandre Locas
    0

    V8 : DefaultAnalyzer on custom index

    Hi, I have created a custom index and I am trying to use a custom Lucene Analyzer. My custom analyzer is displayed in the back office (CustomAnalyzer) on screenshot but the code in it is never fired.

    Does anyone has been successfull doing this kind of thing ?

    Thank youenter image description here

  • Dimitris Delegkos 2 posts 72 karma points
    3 days ago
    Dimitris Delegkos
    0

    Hello

    I am dealing with the same issue.

    Did you ever manage to solve it?

    Thank you

  • Alexandre Locas 25 posts 116 karma points
    3 days ago
    Alexandre Locas
    0

    Hi, I did not solve it. We ended up doing external indexing (with Solr) with our own code, bypassing everything Examine-related.

  • Ismail Mayat 4377 posts 9504 karma points MVP 2x admin c-trib
    3 days ago
    Ismail Mayat
    0

    Can you show the code of your IndexCreator? That is where you setup the analyser, I have a custom index and I have custom analyser my code looks like:

    public class HighQIndexCreator : LuceneIndexCreator
    {
        public override IEnumerable<IIndex> Create()
        {
            //we the sectors and practice areas are multi value fields and we want to analyse them 
            // using keywordanalyser to help with tag searches, eg do not token on space and do not replace stop 
            //words else any tag with "And" in it wont work
            PerFieldAnalyzerWrapper customAnalyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_30));
    
            customAnalyzer.AddAnalyzer(HiqhQIndexConstants.IndexFieldAliases.TagPrefix + HiqhQIndexConstants.IndexFieldAliases.Sectors, 
                new KeywordAnalyzer()); 
    
            customAnalyzer.AddAnalyzer(HiqhQIndexConstants.IndexFieldAliases.TagPrefix + HiqhQIndexConstants.IndexFieldAliases.PracticeAreas, 
                new KeywordAnalyzer());
    
            customAnalyzer.AddAnalyzer(HiqhQIndexConstants.IndexFieldAliases.TagPrefix + HiqhQIndexConstants.IndexFieldAliases.Authors, 
                new KeywordAnalyzer());
    
            var index = new LuceneIndex(HiqhQIndexConstants.ExamineIndexName,
                CreateFileSystemLuceneDirectory(HiqhQIndexConstants.ExamineIndexName),
                new FieldDefinitionCollection(
    
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.Title, FieldDefinitionTypes.FullTextSortable),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.Description, FieldDefinitionTypes.FullText),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.Content, FieldDefinitionTypes.FullText),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.Contents, FieldDefinitionTypes.FullText),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.PublicationDate, FieldDefinitionTypes.DateTime),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.Url, FieldDefinitionTypes.Raw),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.Author, FieldDefinitionTypes.FullText),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.Authors+"Raw", FieldDefinitionTypes.FullText),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.AuthorEmail,FieldDefinitionTypes.EmailAddress),
                    new FieldDefinition(HiqhQIndexConstants.IndexFieldAliases.Id, FieldDefinitionTypes.Integer)
                ),
                customAnalyzer);
    
            return new[] { index };
        }
    }
    

    Regards

    Ismail

  • Dimitris Delegkos 2 posts 72 karma points
    3 days ago
    Dimitris Delegkos
    0

    Hello Ismail

    I studied your code and I thing that I have implemented something similar.

    I am pasting below parts of my code. Please note that I am working on Umbraco Cloud 8.5.1.

    My index creator

    public override IEnumerable<IIndex> Create()
        {
            return new[]
            {
                CreateCustomIndex()
            };
        }
    
        private IIndex CreateCustomIndex()
        {
            var index = new UmbracoContentIndex(Constants.UmbracoIndexes.WebsiteCustomIndexName,
                        CreateFileSystemLuceneDirectory(Constants.UmbracoIndexes.WebsiteCustomIndexPath),
                        new UmbracoFieldDefinitionCollection(),
                        new CustomLuceneAnalyzer(),
                        ProfilingLogger,
                        LanguageService,
                        new ContentValueSetValidator(true));
    
            return index;
        }
    

    My analyzer:

    public class CustomLuceneAnalyzer : Analyzer
    {
        public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
        {
            NormalizeCharMap map = new NormalizeCharMap();
    
            map.Add("ά", "α");
            map.Add("έ", "ε");
            map.Add("ί", "ι");
            map.Add("ό", "ο");
            map.Add("ύ", "υ");
            map.Add("ή", "η");
            map.Add("ώ", "ω");
            map.Add("ς", "σ");
            map.Add("Ά", "Α");
            map.Add("Έ", "ε");
            map.Add("Ί", "ι");
            map.Add("Ό", "ο");
            map.Add("Ύ", "υ");
            map.Add("Ή", "η");
            map.Add("Ώ", "ω");
    
            StandardTokenizer tokenizer = new StandardTokenizer(Lucene.Net.Util.Version.LUCENE_30, reader);
            tokenizer.MaxTokenLength =255;
            TokenStream stream = new StandardFilter(tokenizer);
            stream = new LowerCaseFilter(stream);
            stream= new ASCIIFoldingFilter(stream);
            new MappingCharFilter(map, reader);
            return stream;
        }
    }
    

    The goal of the implementation is to achieve search functionality for words with accents and word derivatives in Greek language (e.g. if the keyword is science I want to get results including the word scientific).

    Thank you

  • Ismail Mayat 4377 posts 9504 karma points MVP 2x admin c-trib
    3 days ago
    Ismail Mayat
    0

    Dimitris,

    I have not tried it on cloud this is on azure webapp. Also with Umbraco 8.3.0

    With regards to your problem, I was a little confused with what you are trying to solve. So if the issue is ascii folding i,e searches in greek not working then as long as you are not doing wildcard searches the standard analyser will already ascii flatten for you during indexing and searching and it should all work.

    It you are wildcard searching then during searching you could ascii flatten the query then search. I have done this in the past.

    Also any reason why you are not using greek analyser?

    https://github.com/apache/lucenenet/tree/3.0.3/src/contrib/Analyzers/El

    Regarding your point:

    if the keyword is science I want to get results including the word scientific

    This would imply stemming? You could use port stemmer in your custom analyser?

    Coming back your original point code not firing, have you tested locally with debugger? does it hit break point? If not could be something todo with composition ordering?

Please Sign in or register to post replies

Write your reply to:

Draft