Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Matthew Wise 271 posts 1373 karma points MVP 4x c-trib
    Jul 17, 2018 @ 16:36
    Matthew Wise
    0

    Examine autocomplete / suggest

    Hi,

    Am trying to create a site search that suggests search terms while searching.

    The term maybe mulitple words and within the bodyText etc.

    I have found the following however am not sure how I would achieve the same thing through examine. As the code indexes another index.

    https://stackoverflow.com/a/9183416/503301

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Jul 18, 2018 @ 09:26
    Ismail Mayat
    0

    As a quick guess, build the AutoCompleteAnalyzer from that stackoverflow post. Then in examine create your indexer and searcher for a new index but use this analyser. Then do examine search in webapi or whatever on that index.

    Take alook at this as well http://blog.aabech.no/archive/building-a-spell-checker-for-search-in-umbraco/ its spell checker example but the stackoverflow article also uses this. In fact I would use Lars code to create the indexer for examine but as stated before use the analyser from stack overflow.

    Regards

    Ismail

  • Ismail Mayat 4511 posts 10090 karma points MVP 2x admin c-trib
    Jul 23, 2018 @ 08:20
    Ismail Mayat
    1

    Matthew,

    I have something working. So what i did was using Lars's blog post created a spellchecker. Next I created another indexer using the code from that stackoverflow post and it all works.

    Couple of issues I could not get it to work without having the spellchecker index as intermediary. Also as you add new documents you would need to rebuild this suggest index. I will play a bit more when i have time ideally need this as an examine index.

  • Matthew Wise 271 posts 1373 karma points MVP 4x c-trib
    Jul 23, 2018 @ 08:26
    Matthew Wise
    0

    Hi Ismail,

    Sounds like great progress, am hoping to get back on this later in the week or next.

    When I was looking at it I figured updating it would be come an issue.

    Matt

  • Rasmus Fjord 675 posts 1566 karma points c-trib
    Apr 10, 2019 @ 13:45
    Rasmus Fjord
    0

    Hey im trying to follow the builds here and piece it together but im stuck.

    So ive build Eric's spellchecker, and it works, super great, but that is on a word by word basis. So I looked to the stackoverflow example, and have that implemented.

    Whenever I try to use the autocomplete analyser, it just throws a YSOD:

    Parser Error Message: Value cannot be null. Parameter name: type

      <add name="AutoSuggestIndexer" 
           type="test.Umbraco.Search.AutoSuggest.Indexers.AutoSuggestIndexer, test.Umbraco.Search.AutoSuggest"
             analyzer="test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete.AutoCompleteAnalyzer, test.Umbraco.Search.AutoSuggest"/>
    

    any pointers

  • Rasmus Fjord 675 posts 1566 karma points c-trib
    Apr 11, 2019 @ 09:31
    Rasmus Fjord
    0

    Just a follow up here.

    Still not any futher, keeps getting an error :

    Unable to cast object of type 'test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete' to type 'Lucene.Net.Analysis.Analyzer'.
    

    Ive build Eric's spellchecker indexer, which works great, but need to combine it wit the autocomplete analyser from stackoverflow.

    So ive taken the code from StackOverflow and modded it a bid so that it hits the right folder for the index etc. So it looks like this now:

    public class SearchAutoComplete { public int MaxResults { get; set; }

    private class AutoCompleteAnalyzer : Analyzer
    {
        public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
        {
            TokenStream result = new StandardTokenizer(kLuceneVersion, reader);
    
            result = new StandardFilter(result);
            result = new LowerCaseFilter(result);
            result = new ASCIIFoldingFilter(result);
            result = new StopFilter(false, result, StopFilter.MakeStopSet(kEnglishStopWords));
            result = new EdgeNGramTokenFilter(
                result, Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.DEFAULT_SIDE, 1, 20);
    
            return result;
        }
    }
    
    private static readonly Lucene.Net.Util.Version kLuceneVersion = Lucene.Net.Util.Version.LUCENE_29;
    
    private static readonly String kGrammedWordsField = "words";
    
    private static readonly String kSourceWordField = "sourceWord";
    
    private static readonly String kCountField = "count";
    
    private static readonly String[] kEnglishStopWords = {
        "a", "an", "and", "are", "as", "at", "be", "but", "by",
        "for", "i", "if", "in", "into", "is",
        "no", "not", "of", "on", "or", "s", "such",
        "t", "that", "the", "their", "then", "there", "these",
        "they", "this", "to", "was", "will", "with"
    };
    
    private readonly Directory m_directory;
    
    private IndexReader m_reader;
    
    private IndexSearcher m_searcher;
    
    private readonly Analyzer _analyzer;
    
    public SearchAutoComplete(BaseSearchProvider baseSearchProvider)
    {
        var luceneSearch = (BaseLuceneSearcher)baseSearchProvider;
        _analyzer = new AutoCompleteAnalyzer();
    }
    public SearchAutoComplete()
    {
        _analyzer = new AutoCompleteAnalyzer();
    
        this.m_directory = FSDirectory.Open(
            new System.IO.DirectoryInfo(
                System.Web.Hosting.HostingEnvironment.ApplicationPhysicalPath + @"App_Data\TEMP\ExamineIndexes\" + Environment.MachineName + @"\AutoSuggest\Index\"));
        MaxResults = 8;
    
        ReplaceSearcher();
    }
    
    
    
    /// <summary>
    /// Find terms matching the given partial word that appear in the highest number of documents.</summary>
    /// <param name="term">A word or part of a word</param>
    /// <returns>A list of suggested completions</returns>
    public IEnumerable<String> SuggestTermsFor(string term) 
    {
        if (m_searcher == null)
            return new string[] { };
    
        // get the top terms for query
        Query query = new TermQuery(new Term(kGrammedWordsField, term.ToLower()));
        Sort sort = new Sort(new SortField(kCountField, SortField.INT));
    
        TopDocs docs = m_searcher.Search(query, null, MaxResults, sort);
        string[] suggestions = docs.ScoreDocs.Select(doc => 
            m_reader.Document(doc.doc).Get(kSourceWordField)).ToArray();
    
        return suggestions;
    }
    
    
    /// <summary>
    /// Open the index in the given directory and create a new index of word frequency for the 
    /// given index.</summary>
    /// <param name="sourceDirectory">Directory containing the index to count words in.</param>
    /// <param name="fieldToAutocomplete">The field in the index that should be analyzed.</param>
    public void BuildAutoCompleteIndex(Directory sourceDirectory, String fieldToAutocomplete)
    {
        // build a dictionary (from the spell package)
        using (IndexReader sourceReader = IndexReader.Open(sourceDirectory, true))
        {
            LuceneDictionary dict = new LuceneDictionary(sourceReader, fieldToAutocomplete);
    
            // code from
            // org.apache.lucene.search.spell.SpellChecker.indexDictionary(
            // Dictionary)
            //IndexWriter.Unlock(m_directory);
    
            // use a custom analyzer so we can do EdgeNGramFiltering
            var analyzer = new AutoCompleteAnalyzer();
            using (var writer = new IndexWriter(m_directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED))
            {
                writer.SetMergeFactor(300);
                writer.SetMaxBufferedDocs(150);
    
                // go through every word, storing the original word (incl. n-grams) 
                // and the number of times it occurs
                foreach (string word in dict)
                {
                    if (word.Length < 3)
                        continue; // too short we bail but "too long" is fine...
    
                    // ok index the word
                    // use the number of documents this word appears in
                    int freq = sourceReader.DocFreq(new Term(fieldToAutocomplete, word));
                    var doc = MakeDocument(fieldToAutocomplete, word, freq);
    
                    writer.AddDocument(doc);
                }
    
                writer.Optimize();
            }
    
        }
    
        // re-open our reader
        ReplaceSearcher();
    }
    
    private static Document MakeDocument(String fieldToAutocomplete, string word, int frequency)
    {
        var doc = new Document();
        doc.Add(new Field(kSourceWordField, word, Field.Store.YES,
                Field.Index.NOT_ANALYZED)); // orig term
        doc.Add(new Field(kGrammedWordsField, word, Field.Store.YES,
                Field.Index.ANALYZED)); // grammed
        doc.Add(new Field(kCountField,
                frequency.ToString(), Field.Store.NO,
                Field.Index.NOT_ANALYZED)); // count
        return doc;
    }
    
    private void ReplaceSearcher() 
    {
        if (IndexReader.IndexExists(m_directory))
        {
            if (m_reader == null)
                m_reader = IndexReader.Open(m_directory, true);
            else
                m_reader.Reopen();
    
            m_searcher = new IndexSearcher(m_reader);
        }
        else
        {
            m_searcher = null;
        }
    }
    

    }

    It hits the constructor of my analyser, and sets the correct index, so that it goes on inside the replaceSearcher method. But still getting the error.

    This is my Indexer and Searcher:

    Indexer

      <add name="AutoSuggestIndexer" 
           type="test.Umbraco.Search.AutoSuggest.Indexers.AutoSuggestIndexer, test.Umbraco.Search.AutoSuggest"
           analyzer="test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete, test.Umbraco.Search.AutoSuggest"
            />
    

    Searcher:

    <add name="AutoSuggestSearcher"
           type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
           analyzer="test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete, test.Umbraco.Search.AutoSuggest"
    
        />
    
Please Sign in or register to post replies

Write your reply to:

Draft