I've build a search functionality, but yesterday i came across a some bad behavoir when when using special characters. I searched the web, and found that the QueryParser could escape the queries for me, but its not working as expected.
I use a combo of Examine and Lucene, but at the end , i give my searcher the raw query, so examine handles the search. First i escape the query with Lucene Query Parser, then divide the query into a list of terms, but when i it gets through my MultiFieldQueryParser the terms with specialchars losses the \\ (\\-). buttom line is, the search dosn't handle specialchars, and especially not if they are the only char in the term (ex . where is the - fish). I'm not sure if it's because i use fuzzy or whatever..
when using cogworks examine inspector, it seems, if the searchterm is ean-numre, it strips the - and ends up with "ean numre" - is this the approch ?
In examine inspector I do not clean up the query string I just run through queryparser and that cleans it up. Additionally i would check your index i think during the indexing process standard analyser will strip out speacial chars so do a search using examine inspector for ean or numre and see what you get I am certain if wont have - in the field.
@Ismail - in the inspector i seems like it does have - in it, ean-numre. But it seems like that the analyzer converts ean-numre, to "ean numre" which actually gives me the right docs. Same goes if i specificly have my searchQuery look like : "ean-numre" i find the right ones. But not if i have ean-numre without " " ..
Can't seem to figure out how to work around this, i thought the analyzer would do that for me.
For the record, the external Analyzer uses Standard analyzer right ?
Examine / Lucene Special Characters
Hello!
I've build a search functionality, but yesterday i came across a some bad behavoir when when using special characters. I searched the web, and found that the QueryParser could escape the queries for me, but its not working as expected.
I use a combo of Examine and Lucene, but at the end , i give my searcher the raw query, so examine handles the search.
First i escape the query with Lucene Query Parser, then divide the query into a list of terms, but when i it gets through my MultiFieldQueryParser the terms with specialchars losses the \\ (\\-).
buttom line is, the search dosn't handle specialchars, and especially not if they are the only char in the term (ex . where is the - fish). I'm not sure if it's because i use fuzzy or whatever..
when using cogworks examine inspector, it seems, if the searchterm is ean-numre, it strips the - and ends up with "ean numre" - is this the approch ?
Hopefully someone can help me out.
-Niclas Schumacher
Niclas,
In examine inspector I do not clean up the query string I just run through queryparser and that cleans it up. Additionally i would check your index i think during the indexing process standard analyser will strip out speacial chars so do a search using examine inspector for ean or numre and see what you get I am certain if wont have - in the field.
Regards
Ismail
@Ismail - in the inspector i seems like it does have - in it, ean-numre. But it seems like that the analyzer converts ean-numre, to "ean numre" which actually gives me the right docs.
Same goes if i specificly have my searchQuery look like : "ean-numre" i find the right ones. But not if i have ean-numre without " " ..
Can't seem to figure out how to work around this, i thought the analyzer would do that for me.
For the record, the external Analyzer uses Standard analyzer right ?
Niclas,
Without the quotes and - it would be doing an or with quotes its phrase query both should work. The external analyser is standard.
Regards
Ismail
is working on a reply...