Examine Lucene Partial Matching on Non English Words
Hi,
I am using umbraco 4.7, currently have examine set up and able to search a site, however i wish to search on product codes which are not english words and also partial matching on product codes. e.g.
product 1 - RTGUHJK
product 2 - RTGUTTT
product 3 - RFGGUUI
so if i search for "RTG" i want examine to return product 1 and 2, however since they are not dictionary words they are not found using wildcard searching such as *.
What i have tried
- I have used a WhitespaceAnalyzer instead of a StandardAnalyzer
- I have examined the index and the related field and the codes are being indexed
- wildcard (*) expressions do not work
In theory i would like the search to apply a contains() or startsWith() logic in order to find the related document.
the product codes do exist within the index i have used MultipleChracterWildCard which works for codes "starting with" the search query but not for contains
e.g.
product code: RTGUHJK
searching RTG
gives RTGUHJK
however searching TGU
returns nothing
i have tried to enable leading wild card characters as per this post:
I've done a couple of Lucene (using a few APIs, Examine and Sitecore.Search) implementations recently, and I'd like to add a couple of notes you might find useful:
Commonly used words are ignored by default, so if you're searching products starting with 'A', you're probably not going to get a resultset. You'll probably want to disable that in config. I don't recall if this only applies to the standard analyser...
I think adding a boost to products might be a good idea in this scenario.
IIRC, the whitespace analyser is case sensitive - helpful to bear in mind in these scenarios.
There's some escaping behaviour that happens here too.
All of this is based on Lucene functionality - Examine isn't much of an expansion from Lucene like Sitecore.Search is, so the Lucene documentation is the place to go here. I do agree that Examine's documentation is poor, though the Umbraco.tv video is supposed to be very good.
I personally only started using Umbraco this week so I have no idea on the docs but like you I do wish they would be more fleshed out in certain places. The joy of open source I guess. Once you work it out you can write the docs and some other poor sod following you won't have the same trauma as you..or me :)
Examine Lucene Partial Matching on Non English Words
Hi,
I am using umbraco 4.7, currently have examine set up and able to search a site, however i wish to search on product codes which are not english words and also partial matching on product codes. e.g.
product 1 - RTGUHJK
product 2 - RTGUTTT
product 3 - RFGGUUI
so if i search for "RTG" i want examine to return product 1 and 2, however since they are not dictionary words they are not found using wildcard searching such as *.
What i have tried
- I have used a WhitespaceAnalyzer instead of a StandardAnalyzer
- I have examined the index and the related field and the codes are being indexed
- wildcard (*) expressions do not work
In theory i would like the search to apply a contains() or startsWith() logic in order to find the related document.
Any help or advice is greatly appreciated
Thank You
R
any reply on this???
so much for "
The friendliest CMS community on the planet
"
I don't think them being dictionary words has anything to do with it as they are either in the index and searchable or not.
For your wildcard lookup did you lookinto using .MultipleCharacterWildCard() ?
thank you for your reply
the product codes do exist within the index i have used MultipleChracterWildCard which works for codes "starting with" the search query but not for contains
e.g.
product code: RTGUHJK
searching RTG
gives RTGUHJK
however searching TGU
returns nothing
i have tried to enable leading wild card characters as per this post:
http://our.umbraco.org/forum/developers/api-questions/12168-Examine-Leading-wildcards?p=1
however still not working, no results returned
any suggestions?
ok i have manged to get this working as follows:
sc.GroupedOr(new string[] {"productCode"},
Examine.LuceneEngine.SearchCriteria.LuceneSearchExtensions.MultipleCharacterWildcard("*" + searchQuery)).Compile();
i was missing "*"
thanks
p.s. why is there no documentation / api docs for examine?
I've done a couple of Lucene (using a few APIs, Examine and Sitecore.Search) implementations recently, and I'd like to add a couple of notes you might find useful:
All of this is based on Lucene functionality - Examine isn't much of an expansion from Lucene like Sitecore.Search is, so the Lucene documentation is the place to go here. I do agree that Examine's documentation is poor, though the Umbraco.tv video is supposed to be very good.
I personally only started using Umbraco this week so I have no idea on the docs but like you I do wish they would be more fleshed out in certain places. The joy of open source I guess. Once you work it out you can write the docs and some other poor sod following you won't have the same trauma as you..or me :)
is working on a reply...