examine autocomplete suggest

Matthew Wise 271 posts 1373 karma points MVP 5x c-trib

Jul 17, 2018 @ 16:36

Examine autocomplete / suggest

Hi,

Am trying to create a site search that suggests search terms while searching.

The term maybe mulitple words and within the bodyText etc.

I have found the following however am not sure how I would achieve the same thing through examine. As the code indexes another index.

https://stackoverflow.com/a/9183416/503301

Copy Link

Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Jul 18, 2018 @ 09:26

As a quick guess, build the AutoCompleteAnalyzer from that stackoverflow post. Then in examine create your indexer and searcher for a new index but use this analyser. Then do examine search in webapi or whatever on that index.

Take alook at this as well http://blog.aabech.no/archive/building-a-spell-checker-for-search-in-umbraco/ its spell checker example but the stackoverflow article also uses this. In fact I would use Lars code to create the indexer for examine but as stated before use the analyser from stack overflow.

Regards

Ismail

Copy Link

Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib

Jul 23, 2018 @ 08:20

Matthew,

I have something working. So what i did was using Lars's blog post created a spellchecker. Next I created another indexer using the code from that stackoverflow post and it all works.

Couple of issues I could not get it to work without having the spellchecker index as intermediary. Also as you add new documents you would need to rebuild this suggest index. I will play a bit more when i have time ideally need this as an examine index.

Copy Link

Matthew Wise 271 posts 1373 karma points MVP 5x c-trib

Jul 23, 2018 @ 08:26

Hi Ismail,

Sounds like great progress, am hoping to get back on this later in the week or next.

When I was looking at it I figured updating it would be come an issue.

Matt

Copy Link

Rasmus Fjord 675 posts 1566 karma points c-trib

Apr 10, 2019 @ 13:45

Hey im trying to follow the builds here and piece it together but im stuck.

So ive build Eric's spellchecker, and it works, super great, but that is on a word by word basis. So I looked to the stackoverflow example, and have that implemented.

Whenever I try to use the autocomplete analyser, it just throws a YSOD:

Parser Error Message: Value cannot be null. Parameter name: type

  <add name="AutoSuggestIndexer" 
       type="test.Umbraco.Search.AutoSuggest.Indexers.AutoSuggestIndexer, test.Umbraco.Search.AutoSuggest"
         analyzer="test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete.AutoCompleteAnalyzer, test.Umbraco.Search.AutoSuggest"/>

any pointers

Copy Link

Rasmus Fjord 675 posts 1566 karma points c-trib

Apr 11, 2019 @ 09:31

Just a follow up here.

Still not any futher, keeps getting an error :

Unable to cast object of type 'test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete' to type 'Lucene.Net.Analysis.Analyzer'.

Ive build Eric's spellchecker indexer, which works great, but need to combine it wit the autocomplete analyser from stackoverflow.

So ive taken the code from StackOverflow and modded it a bid so that it hits the right folder for the index etc. So it looks like this now:

public class SearchAutoComplete { public int MaxResults { get; set; }

private class AutoCompleteAnalyzer : Analyzer
{
    public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
    {
        TokenStream result = new StandardTokenizer(kLuceneVersion, reader);

        result = new StandardFilter(result);
        result = new LowerCaseFilter(result);
        result = new ASCIIFoldingFilter(result);
        result = new StopFilter(false, result, StopFilter.MakeStopSet(kEnglishStopWords));
        result = new EdgeNGramTokenFilter(
            result, Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.DEFAULT_SIDE, 1, 20);

        return result;
    }
}

private static readonly Lucene.Net.Util.Version kLuceneVersion = Lucene.Net.Util.Version.LUCENE_29;

private static readonly String kGrammedWordsField = "words";

private static readonly String kSourceWordField = "sourceWord";

private static readonly String kCountField = "count";

private static readonly String[] kEnglishStopWords = {
    "a", "an", "and", "are", "as", "at", "be", "but", "by",
    "for", "i", "if", "in", "into", "is",
    "no", "not", "of", "on", "or", "s", "such",
    "t", "that", "the", "their", "then", "there", "these",
    "they", "this", "to", "was", "will", "with"
};

private readonly Directory m_directory;

private IndexReader m_reader;

private IndexSearcher m_searcher;

private readonly Analyzer _analyzer;

public SearchAutoComplete(BaseSearchProvider baseSearchProvider)
{
    var luceneSearch = (BaseLuceneSearcher)baseSearchProvider;
    _analyzer = new AutoCompleteAnalyzer();
}
public SearchAutoComplete()
{
    _analyzer = new AutoCompleteAnalyzer();

    this.m_directory = FSDirectory.Open(
        new System.IO.DirectoryInfo(
            System.Web.Hosting.HostingEnvironment.ApplicationPhysicalPath + @"App_Data\TEMP\ExamineIndexes\" + Environment.MachineName + @"\AutoSuggest\Index\"));
    MaxResults = 8;

    ReplaceSearcher();
}



/// <summary>
/// Find terms matching the given partial word that appear in the highest number of documents.</summary>
/// <param name="term">A word or part of a word</param>
/// <returns>A list of suggested completions</returns>
public IEnumerable<String> SuggestTermsFor(string term) 
{
    if (m_searcher == null)
        return new string[] { };

    // get the top terms for query
    Query query = new TermQuery(new Term(kGrammedWordsField, term.ToLower()));
    Sort sort = new Sort(new SortField(kCountField, SortField.INT));

    TopDocs docs = m_searcher.Search(query, null, MaxResults, sort);
    string[] suggestions = docs.ScoreDocs.Select(doc => 
        m_reader.Document(doc.doc).Get(kSourceWordField)).ToArray();

    return suggestions;
}


/// <summary>
/// Open the index in the given directory and create a new index of word frequency for the 
/// given index.</summary>
/// <param name="sourceDirectory">Directory containing the index to count words in.</param>
/// <param name="fieldToAutocomplete">The field in the index that should be analyzed.</param>
public void BuildAutoCompleteIndex(Directory sourceDirectory, String fieldToAutocomplete)
{
    // build a dictionary (from the spell package)
    using (IndexReader sourceReader = IndexReader.Open(sourceDirectory, true))
    {
        LuceneDictionary dict = new LuceneDictionary(sourceReader, fieldToAutocomplete);

        // code from
        // org.apache.lucene.search.spell.SpellChecker.indexDictionary(
        // Dictionary)
        //IndexWriter.Unlock(m_directory);

        // use a custom analyzer so we can do EdgeNGramFiltering
        var analyzer = new AutoCompleteAnalyzer();
        using (var writer = new IndexWriter(m_directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED))
        {
            writer.SetMergeFactor(300);
            writer.SetMaxBufferedDocs(150);

            // go through every word, storing the original word (incl. n-grams) 
            // and the number of times it occurs
            foreach (string word in dict)
            {
                if (word.Length < 3)
                    continue; // too short we bail but "too long" is fine...

                // ok index the word
                // use the number of documents this word appears in
                int freq = sourceReader.DocFreq(new Term(fieldToAutocomplete, word));
                var doc = MakeDocument(fieldToAutocomplete, word, freq);

                writer.AddDocument(doc);
            }

            writer.Optimize();
        }

    }

    // re-open our reader
    ReplaceSearcher();
}

private static Document MakeDocument(String fieldToAutocomplete, string word, int frequency)
{
    var doc = new Document();
    doc.Add(new Field(kSourceWordField, word, Field.Store.YES,
            Field.Index.NOT_ANALYZED)); // orig term
    doc.Add(new Field(kGrammedWordsField, word, Field.Store.YES,
            Field.Index.ANALYZED)); // grammed
    doc.Add(new Field(kCountField,
            frequency.ToString(), Field.Store.NO,
            Field.Index.NOT_ANALYZED)); // count
    return doc;
}

private void ReplaceSearcher() 
{
    if (IndexReader.IndexExists(m_directory))
    {
        if (m_reader == null)
            m_reader = IndexReader.Open(m_directory, true);
        else
            m_reader.Reopen();

        m_searcher = new IndexSearcher(m_reader);
    }
    else
    {
        m_searcher = null;
    }
}

}

It hits the constructor of my analyser, and sets the correct index, so that it goes on inside the replaceSearcher method. But still getting the error.

This is my Indexer and Searcher:

Indexer

  <add name="AutoSuggestIndexer" 
       type="test.Umbraco.Search.AutoSuggest.Indexers.AutoSuggestIndexer, test.Umbraco.Search.AutoSuggest"
       analyzer="test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete, test.Umbraco.Search.AutoSuggest"
        />

Searcher:

<add name="AutoSuggestSearcher"
       type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
       analyzer="test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete, test.Umbraco.Search.AutoSuggest"

    />

Copy Link

is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Flag this post as spam?

Examine autocomplete / suggest