I was able to fix this by replacing those special characters with spaces (or nothing) before searching. I did this by adding a new helper function in ezSearch.cshtml (code sample edited to consolidate hyphen removal):
// Escape the search term
public string EscapeSearchTerm(string input)
{
//colon (:)
input = Regex.Replace(input, @":", " ");
// any hyphen that is between non digits or at the beginning of the term
input = Regex.Replace(input, @"(\D*)-(\D)", "$1 $2");
return input;
}
Then I call that function before the searchTerm get's tokenized (approximately line 64 of ezSearch.cshtml)
This probably isn't the best way to handle it but it does stop the error from happening while returning mostly reasonable results (which is most important for me). You can also customize how you want those characters handled. Since I'm using the Standard analyzer, getting rid of them works best for me but your mileage may vary. There is probably a more efficient way to do this which is why I haven't submitted a pull request.
Many thanks for sharing your changes! This is very clean and makes me wonder why its not there by default.
I also noticed addition of !Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET.Contains(x) which wasnt in the original ezsearch.cshtml that also solves the issue with and/or/by stop words. Without this if you search for text that contains the lucene stop words it wasnt returning aynthing.
Just curious, in your live implementation did you use only the 2 ( colon and hypen) escape of search term. As in is the two adequate or would be ideal to add other(s)?
You can call it directly on your search term which is a string and its an extension method so
model.SearchTerm.MakeSearchQuerySafe()
I have never done anything with stop words as i have always used the standard analyser, this during indexing removes stop words and during querying will also remove stop words. I believe easy search uses standard analyser so in your example
abc and bde by xyz
during indexing you will end up with abc bde xyz in index. When searching you will have stop words removed as well.
Special Characters in Search Term
I am getting Error on ezsearch.cshtml if the search string contains a - or :
like;
solution: explorer
or
terms - service
which are clearly the search filter special characters (-,+,:).
So my question is how to I handle these? Is there a config file to disable the special search characters?
Hi keilo!
I was able to fix this by replacing those special characters with spaces (or nothing) before searching. I did this by adding a new helper function in ezSearch.cshtml (code sample edited to consolidate hyphen removal):
Then I call that function before the searchTerm get's tokenized (approximately line 64 of ezSearch.cshtml)
This probably isn't the best way to handle it but it does stop the error from happening while returning mostly reasonable results (which is most important for me). You can also customize how you want those characters handled. Since I'm using the Standard analyzer, getting rid of them works best for me but your mileage may vary. There is probably a more efficient way to do this which is why I haven't submitted a pull request.
Hey Lewis
Many thanks for sharing your changes! This is very clean and makes me wonder why its not there by default.
I also noticed addition of !Lucene.Net.Analysis.StopAnalyzer.ENGLISH_STOP_WORDS_SET.Contains(x) which wasnt in the original ezsearch.cshtml that also solves the issue with and/or/by stop words. Without this if you search for text that contains the lucene stop words it wasnt returning aynthing.
Just curious, in your live implementation did you use only the 2 ( colon and hypen) escape of search term. As in is the two adequate or would be ideal to add other(s)?
many thanks!
When doing examine / lucene queries i have extension method to remove characters that would break the query
Regards
Ismail
Thanks Ismail, I just implemented this for Our as well, finally.. :-)
Hey Ismail
Thanks for sharing this. Can I check how do you call this function? Do you call it for all (default) or after some-kinda try-catch ?
Also wondering what do you use for handling the stop words when someone search for legit sentence like "abc and bde by xyz"?
Woult it be adequate to use such tokenizer in ezsearch.cshtml?
keilo,
You can call it directly on your search term which is a string and its an extension method so
model.SearchTerm.MakeSearchQuerySafe()
I have never done anything with stop words as i have always used the standard analyser, this during indexing removes stop words and during querying will also remove stop words. I believe easy search uses standard analyser so in your example
abc and bde by xyz
during indexing you will end up with abc bde xyz in index. When searching you will have stop words removed as well.
Regards
Ismail
It possible and preferable to use QueryParser class of Lucene. Example:
Static method Escape add backslash before special characters.
Hi,
I'm running into this issue too with special characters like colons :, brackets {}, etc.
How can I go about fixing this in ezsearch.cshtml?
I'm assuming I just have to make changes to the public string CleanseSearchTerm, but I'm not sure what to do.
I have this so far:
is working on a reply...