Got an issue that seems to re-occur on the forum a few times here. Basically I've got a site that has a search feature, however when phrases are used that contain words like on, and, if or other examine stop words or 2 char sets it throws up an exception.
Current test phrase:
"Focus on sales" - This fails
"Focus sales" - This works
Unfortunately, I didn't write the original search script so I'm not 100% on what it is doing.
var q = Request.QueryString["Query"];
var q_split = q.Trim().Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries);
var fieldsToSearch = new[]
{
"nodeName", "seoDescription", "archetypeBody"
};
var searcher = ExamineManager.Instance.SearchProviderCollection["ContentSearcher"];
var criteria = searcher.CreateSearchCriteria(IndexTypes.Content, BooleanOperation.And);
var query = criteria.Field("seoTitle", q_split.First().MultipleCharacterWildcard().Value.Boost(8));
query = query.Or().GroupedOr(fieldsToSearch, q_split.First().MultipleCharacterWildcard());
criteria.GroupedOr(new[] {"siteIdentifier"}, domainId.ToString());
foreach (var term in q_split.Skip(1))
{
query = query.Or().Field("seoTitle", term.MultipleCharacterWildcard().Value.Boost(8)).Or().GroupedOr(fieldsToSearch, term.MultipleCharacterWildcard());
}
var searchResults = searcher.Search(query.Compile()).OrderByDescending(x => x.Score).TakeWhile(x => x.Score > 0.5f);
it pointed me in the direction of the stop words, however, I cannot seem to put my finger on the right solution. There are times when a user will want to search for phrases containing stop words, or two letter abbreviations.
Something somewhere in there is causing a null reference exception but when I individually check the parameters everything looks fine. I think it is caused somewhere in the Field() method or the GroupedOr() method.
So I never get the point of having a query to process.
I did check to see if the MultipleCharacterWildcard() extension method resulted in null, but that isn't the case it seems to be absolutely fine at that point.
Object reference not set to an instance of an object.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.
Source Error:
Line 88: foreach (var term in q_split.Skip(1))
Line 89: {
Line 90: query = query.Or().Field("seoTitle", term.MultipleCharacterWildcard().Value.Boost(8));
Line 91: query = query.Or().GroupedOr(fieldsToSearch, term.MultipleCharacterWildcard());
Line 92: }
I split out the two bits that make up lines 90 and 91 as they were originally single lines. Nothing that I can watch/debug is null at any point.
When it raises the exception:
term = "on",
term.MultipleCharacterWildcard() has a value
term.MultipleCharacterWildcard().Value = "on*"
term.MultipleCharacterWildcard().Value.Boost(8) has a value
query also has a value
Additional stack trace info:
[NullReferenceException: Object reference not set to an instance of an object.]
Examine.LuceneEngine.SearchCriteria.LuceneSearchCriteria.GetFieldInternalQuery(String fieldName, IExamineValue fieldValue, Boolean useQueryParser) +565
Examine.LuceneEngine.SearchCriteria.LuceneQuery.Field(String fieldName, IExamineValue fieldValue) +29
I suspect that when 'on' is run through standard analyser its replaced and you just get the wildcard. Can you experiment and take of the wildcard so line:
I think you are a hitting the same issue I came across the other day which is down to an issue in examine with Boosted stop words (https://github.com/Shazwazza/Examine/issues/34)
But both return false when the term is a stop term. Also when I inspect ENGLISHSTOPWORDS_SET it has a count of 33, but every list I find in the object is empty, so I'm a bit at a loss as to where they are.
Examine issue when text includes stop words
Hi all,
Got an issue that seems to re-occur on the forum a few times here. Basically I've got a site that has a search feature, however when phrases are used that contain words like on, and, if or other examine stop words or 2 char sets it throws up an exception.
Current test phrase:
"Focus on sales" - This fails "Focus sales" - This works
Unfortunately, I didn't write the original search script so I'm not 100% on what it is doing.
Having read the following threads:
https://our.umbraco.org/forum/developers/extending-umbraco/71626-examine-search-fails-for-two-character-words
https://our.umbraco.org/forum/ourumb-dev-forum/bugs/20727-Lucene-fails-when-seaching-on-common-words
it pointed me in the direction of the stop words, however, I cannot seem to put my finger on the right solution. There are times when a user will want to search for phrases containing stop words, or two letter abbreviations.
Nik,
Which analyser is being used?
Also can you write out the generated query?
Regards
Ismail
Found this as well
Looks like the standard analyzer
When it generates a working query it looks like this:
I'll see if I can find out what the failing query looks like as it is causing a YSOD at the moment when it fails.
Hi Ismail,
The problem is when it hits stop words the following line is throwing the exception:
Something somewhere in there is causing a null reference exception but when I individually check the parameters everything looks fine. I think it is caused somewhere in the Field() method or the GroupedOr() method.
So I never get the point of having a query to process.
I did check to see if the MultipleCharacterWildcard() extension method resulted in null, but that isn't the case it seems to be absolutely fine at that point.
nik,
Whats the actual exception?
Regards
Ismail
This is the error:
I split out the two bits that make up lines 90 and 91 as they were originally single lines. Nothing that I can watch/debug is null at any point.
When it raises the exception:
Additional stack trace info:
Nik,
I suspect that when 'on' is run through standard analyser its replaced and you just get the wildcard. Can you experiment and take of the wildcard so line:
change to
See if that works?
Hey Ismail,
I just changed the first line of the two to:
This still results in the null exception being thrown within the Field method. It definitely doesn't like this very much.
This suggests that you can configure Lucene (and by extension Examine) to have no stop words: https://stackoverflow.com/a/17453193/2052963
I've never tried that myself, but it's worth a go. FWIW, I typically strip out any stop words and escape special characters with
QueryParser.Escape
.Hey Nik,
I think you are a hitting the same issue I came across the other day which is down to an issue in examine with Boosted stop words (https://github.com/Shazwazza/Examine/issues/34)
That was going to be next suggestion take of the boost, basically the stop word is being removed and you are left with boost only hence it blows up.
I would remove the stop words that should fix the issue.
Regards
Ismail
Is there an easy way to check if a term is a stop word? I was looking at trying this:
and this
But both return false when the term is a stop term. Also when I inspect ENGLISHSTOPWORDS_SET it has a count of 33, but every list I find in the object is empty, so I'm a bit at a loss as to where they are.
Never mind, I missed a line of code in the issue linked by Tom that allows me to do the check I need :-)
Thanks guys, you all rock!
is working on a reply...