As a quick guess, build the AutoCompleteAnalyzer from that stackoverflow post. Then in examine create your indexer and searcher for a new index but use this analyser. Then do examine search in webapi or whatever on that index.
I have something working. So what i did was using Lars's blog post created a spellchecker. Next I created another indexer using the code from that stackoverflow post and it all works.
Couple of issues I could not get it to work without having the spellchecker index as intermediary. Also as you add new documents you would need to rebuild this suggest index. I will play a bit more when i have time ideally need this as an examine index.
Hey im trying to follow the builds here and piece it together but im stuck.
So ive build Eric's spellchecker, and it works, super great, but that is on a word by word basis. So I looked to the stackoverflow example, and have that implemented.
Whenever I try to use the autocomplete analyser, it just throws a YSOD:
Parser Error Message: Value cannot be null.
Parameter name: type
Unable to cast object of type 'test.Umbraco.Search.AutoSuggest.Searchers.SearchAutoComplete' to type 'Lucene.Net.Analysis.Analyzer'.
Ive build Eric's spellchecker indexer, which works great, but need to combine it wit the autocomplete analyser from stackoverflow.
So ive taken the code from StackOverflow and modded it a bid so that it hits the right folder for the index etc. So it looks like this now:
public class SearchAutoComplete
{
public int MaxResults { get; set; }
private class AutoCompleteAnalyzer : Analyzer
{
public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
{
TokenStream result = new StandardTokenizer(kLuceneVersion, reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new ASCIIFoldingFilter(result);
result = new StopFilter(false, result, StopFilter.MakeStopSet(kEnglishStopWords));
result = new EdgeNGramTokenFilter(
result, Lucene.Net.Analysis.NGram.EdgeNGramTokenFilter.DEFAULT_SIDE, 1, 20);
return result;
}
}
private static readonly Lucene.Net.Util.Version kLuceneVersion = Lucene.Net.Util.Version.LUCENE_29;
private static readonly String kGrammedWordsField = "words";
private static readonly String kSourceWordField = "sourceWord";
private static readonly String kCountField = "count";
private static readonly String[] kEnglishStopWords = {
"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "i", "if", "in", "into", "is",
"no", "not", "of", "on", "or", "s", "such",
"t", "that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"
};
private readonly Directory m_directory;
private IndexReader m_reader;
private IndexSearcher m_searcher;
private readonly Analyzer _analyzer;
public SearchAutoComplete(BaseSearchProvider baseSearchProvider)
{
var luceneSearch = (BaseLuceneSearcher)baseSearchProvider;
_analyzer = new AutoCompleteAnalyzer();
}
public SearchAutoComplete()
{
_analyzer = new AutoCompleteAnalyzer();
this.m_directory = FSDirectory.Open(
new System.IO.DirectoryInfo(
System.Web.Hosting.HostingEnvironment.ApplicationPhysicalPath + @"App_Data\TEMP\ExamineIndexes\" + Environment.MachineName + @"\AutoSuggest\Index\"));
MaxResults = 8;
ReplaceSearcher();
}
/// <summary>
/// Find terms matching the given partial word that appear in the highest number of documents.</summary>
/// <param name="term">A word or part of a word</param>
/// <returns>A list of suggested completions</returns>
public IEnumerable<String> SuggestTermsFor(string term)
{
if (m_searcher == null)
return new string[] { };
// get the top terms for query
Query query = new TermQuery(new Term(kGrammedWordsField, term.ToLower()));
Sort sort = new Sort(new SortField(kCountField, SortField.INT));
TopDocs docs = m_searcher.Search(query, null, MaxResults, sort);
string[] suggestions = docs.ScoreDocs.Select(doc =>
m_reader.Document(doc.doc).Get(kSourceWordField)).ToArray();
return suggestions;
}
/// <summary>
/// Open the index in the given directory and create a new index of word frequency for the
/// given index.</summary>
/// <param name="sourceDirectory">Directory containing the index to count words in.</param>
/// <param name="fieldToAutocomplete">The field in the index that should be analyzed.</param>
public void BuildAutoCompleteIndex(Directory sourceDirectory, String fieldToAutocomplete)
{
// build a dictionary (from the spell package)
using (IndexReader sourceReader = IndexReader.Open(sourceDirectory, true))
{
LuceneDictionary dict = new LuceneDictionary(sourceReader, fieldToAutocomplete);
// code from
// org.apache.lucene.search.spell.SpellChecker.indexDictionary(
// Dictionary)
//IndexWriter.Unlock(m_directory);
// use a custom analyzer so we can do EdgeNGramFiltering
var analyzer = new AutoCompleteAnalyzer();
using (var writer = new IndexWriter(m_directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED))
{
writer.SetMergeFactor(300);
writer.SetMaxBufferedDocs(150);
// go through every word, storing the original word (incl. n-grams)
// and the number of times it occurs
foreach (string word in dict)
{
if (word.Length < 3)
continue; // too short we bail but "too long" is fine...
// ok index the word
// use the number of documents this word appears in
int freq = sourceReader.DocFreq(new Term(fieldToAutocomplete, word));
var doc = MakeDocument(fieldToAutocomplete, word, freq);
writer.AddDocument(doc);
}
writer.Optimize();
}
}
// re-open our reader
ReplaceSearcher();
}
private static Document MakeDocument(String fieldToAutocomplete, string word, int frequency)
{
var doc = new Document();
doc.Add(new Field(kSourceWordField, word, Field.Store.YES,
Field.Index.NOT_ANALYZED)); // orig term
doc.Add(new Field(kGrammedWordsField, word, Field.Store.YES,
Field.Index.ANALYZED)); // grammed
doc.Add(new Field(kCountField,
frequency.ToString(), Field.Store.NO,
Field.Index.NOT_ANALYZED)); // count
return doc;
}
private void ReplaceSearcher()
{
if (IndexReader.IndexExists(m_directory))
{
if (m_reader == null)
m_reader = IndexReader.Open(m_directory, true);
else
m_reader.Reopen();
m_searcher = new IndexSearcher(m_reader);
}
else
{
m_searcher = null;
}
}
}
It hits the constructor of my analyser, and sets the correct index, so that it goes on inside the replaceSearcher method. But still getting the error.
Examine autocomplete / suggest
Hi,
Am trying to create a site search that suggests search terms while searching.
The term maybe mulitple words and within the bodyText etc.
I have found the following however am not sure how I would achieve the same thing through examine. As the code indexes another index.
https://stackoverflow.com/a/9183416/503301
As a quick guess, build the AutoCompleteAnalyzer from that stackoverflow post. Then in examine create your indexer and searcher for a new index but use this analyser. Then do examine search in webapi or whatever on that index.
Take alook at this as well http://blog.aabech.no/archive/building-a-spell-checker-for-search-in-umbraco/ its spell checker example but the stackoverflow article also uses this. In fact I would use Lars code to create the indexer for examine but as stated before use the analyser from stack overflow.
Regards
Ismail
Matthew,
I have something working. So what i did was using Lars's blog post created a spellchecker. Next I created another indexer using the code from that stackoverflow post and it all works.
Couple of issues I could not get it to work without having the spellchecker index as intermediary. Also as you add new documents you would need to rebuild this suggest index. I will play a bit more when i have time ideally need this as an examine index.
Hi Ismail,
Sounds like great progress, am hoping to get back on this later in the week or next.
When I was looking at it I figured updating it would be come an issue.
Matt
Hey im trying to follow the builds here and piece it together but im stuck.
So ive build Eric's spellchecker, and it works, super great, but that is on a word by word basis. So I looked to the stackoverflow example, and have that implemented.
Whenever I try to use the autocomplete analyser, it just throws a YSOD:
Parser Error Message: Value cannot be null. Parameter name: type
any pointers
Just a follow up here.
Still not any futher, keeps getting an error :
Ive build Eric's spellchecker indexer, which works great, but need to combine it wit the autocomplete analyser from stackoverflow.
So ive taken the code from StackOverflow and modded it a bid so that it hits the right folder for the index etc. So it looks like this now:
public class SearchAutoComplete { public int MaxResults { get; set; }
}
It hits the constructor of my analyser, and sets the correct index, so that it goes on inside the replaceSearcher method. But still getting the error.
This is my Indexer and Searcher:
Indexer
Searcher:
is working on a reply...