I'm using Umbrraco 7.2.1 on .Net 4.5.2 on a Windows 7 machine. I set up a basic search index and provider using the standard analyzer. I did not set any specific index fields so that it will just index everything. When I perform a basic search on the singular and plural versions of a word, I get different results. But the weird part is that I get less results with the singular version, which seems weird to me. E.g. searching on 'solutions' gives me 11 results but just 'solution' only gives me 2. Seems like it must be doing an exact word search or something, but I thought that the standard analyzer would work for finding all or part of a word. My query and config is below. Should I be using a different analyzer or something?
When I use the Examine Management dashboard in the backoffice I can reproduce this behavior not just in my custom index, but also in the InternalIndex/Searcher.
As far as I know the Standard Analyser only breaks up the search query into terms, without any stemming or altering the original term. The number of results is based on the exact matching terms. I've never used it myself but I think SnowballAnalyzer can achieve this?
Wildcard your searchTerm so searchTerm.MultipleCharacterWildcard() its an extension method see if that helps, if you need stemming you will need to use snowball
I was trying to do a basic search (on all fields since there aren't many). I couldn't get the wildcard search working because it wanted me to supply the field(s) to search on, so I just concatenated the data in the GatheringNodeData event and searched on that field and it works now.
I was not able to get the snowball analyzer working - couldn't get it configured correctly, and couldn't find much documentation on how to properly configure it to use in Umbraco. I will likely look more into it in another project.
It would definitely be useful to see some documentation/examples of how other people have implemented their own search in Umbraco without having to spend a lot of time poring through forum posts. Wishful thinking I guess.
I have used snowball but with older umbraco versions pre examine in this instance we have examine as a wrapper around things so not sure how you would use snowball unless before you query you run your term through snowball?
Is there anything out there in terms of setting up Facets? I saw that Shannon had a branch on the go at some point but it doesn't seem to have been finished.
Edit: Nevermind, just saw it's covered in your video! Thanks for this :D I've been struggling with them for a while.
Did some reading, you could download the snowball analyser from lucene.net contrib and build then in examie config for the indexer / searcher replace analyser with snowball then on indexing it will stem and searching will run query term through stemming as well, not tried this but reckon it should work.
Hi Ismail. I'm considering using the Snowball analyzer. But I'm not clear how I'd specify the "English" version, and also the stop words, in the examine config file. The constructor mentions that these need to be specified; e.g. the stop words will be empty otherwise. Perhaps the analyzer can be added in code.
you will need to create your own analyzer that inherits from snowball and hardcode in the constructor the stop words and analyser to use then pass that up to snowball and in examine config instead of using snowball directly use your class?
Thanks Ismail. That makes obvious sense. One other query I had was around the stop words. Is there a standard array of English stop words within Lucene.Net that I can use? Or will I need to generate my own array?
I'm starting on the development work for this today, but have another question. The version of Lucene.Net in Umbraco 6.1.4 is 2.9.4 (file version 2.9.4.1). The only contrib version I can find is 3.0.3, which requires Lucene.Net 3.0.3.
Do you know if there will be any issues in updating the Lucene.Net dll to version 3.0.3? Or am I safer to try and find the 2.9.4 compatible version of contrib? Do you know where I can find it?
Any reason why you want latest version of lucene.net? I suspect Examine would probably break if you replaced the version of lucene.net its been built to.
That's what I was asking. I don't want to use the latest version of lucene (3.0.3), I wan't to use version 2.9.4 (used by Umbraco 6.1.4). The problem is, I can't find a downloadable verion of contrib 2.9.4. All I can find is contrib version 3.0.3, but that requires lucene.net 3.0.3. I'm trying to avoid the situation you're describing. :-)
The thing is, I haven't had cause to download lucene before, and I'm struggling to find it.
Search not returning expected results
I'm using Umbrraco 7.2.1 on .Net 4.5.2 on a Windows 7 machine. I set up a basic search index and provider using the standard analyzer. I did not set any specific index fields so that it will just index everything. When I perform a basic search on the singular and plural versions of a word, I get different results. But the weird part is that I get less results with the singular version, which seems weird to me. E.g. searching on 'solutions' gives me 11 results but just 'solution' only gives me 2. Seems like it must be doing an exact word search or something, but I thought that the standard analyzer would work for finding all or part of a word. My query and config is below. Should I be using a different analyzer or something?
When I use the Examine Management dashboard in the backoffice I can reproduce this behavior not just in my custom index, but also in the InternalIndex/Searcher.
As far as I know the Standard Analyser only breaks up the search query into terms, without any stemming or altering the original term. The number of results is based on the exact matching terms. I've never used it myself but I think SnowballAnalyzer can achieve this?
Ryan,
Wildcard your searchTerm so searchTerm.MultipleCharacterWildcard() its an extension method see if that helps, if you need stemming you will need to use snowball
Regards
Ismial
I was trying to do a basic search (on all fields since there aren't many). I couldn't get the wildcard search working because it wanted me to supply the field(s) to search on, so I just concatenated the data in the GatheringNodeData event and searched on that field and it works now.
I was not able to get the snowball analyzer working - couldn't get it configured correctly, and couldn't find much documentation on how to properly configure it to use in Umbraco. I will likely look more into it in another project.
It would definitely be useful to see some documentation/examples of how other people have implemented their own search in Umbraco without having to spend a lot of time poring through forum posts. Wishful thinking I guess.
Thanks.
Ryan,
I have used snowball but with older umbraco versions pre examine in this instance we have examine as a wrapper around things so not sure how you would use snowball unless before you query you run your term through snowball?
Regards
Ismial
Yeah documentation is a real issue with the search stuff!
Valerie,
With regards to docs i would recommend:
https://our.umbraco.org/Documentation/Reference/Searching/Examine/index
https://www.youtube.com/watch?v=6AMb0rrSrJw
http://thecogworks.co.uk/blog/posts/2012/november/examiness-hints-and-tips-from-the-trenches-part-1/ (there are 10 parts they expand on topics in my presentation)
Work throught that order and you will be examine guru in no time ;-}
Is there anything out there in terms of setting up Facets? I saw that Shannon had a branch on the go at some point but it doesn't seem to have been finished.
Edit: Nevermind, just saw it's covered in your video! Thanks for this :D I've been struggling with them for a while.
Not used anything in shannons branch, have played with Bobo and there is really good article here from 24 days http://24days.in/umbraco/2014/search-with-bobo/
Ryan,
Did some reading, you could download the snowball analyser from lucene.net contrib and build then in examie config for the indexer / searcher replace analyser with snowball then on indexing it will stem and searching will run query term through stemming as well, not tried this but reckon it should work.
Regards
Ismail
Hi Ismail. I'm considering using the Snowball analyzer. But I'm not clear how I'd specify the "English" version, and also the stop words, in the examine config file. The constructor mentions that these need to be specified; e.g. the stop words will be empty otherwise. Perhaps the analyzer can be added in code.
I managed to get my results to behave like this by adding "fuzzy" to the criteria (I wanted "politic" to return "politics and "political")
https://gist.github.com/anonymous/6076f3fdd1e6e3f66b1d
Like here http://umbraco.com/follow-us/blog-archive/2011/9/16/examining-examine
Mark,
you will need to create your own analyzer that inherits from snowball and hardcode in the constructor the stop words and analyser to use then pass that up to snowball and in examine config instead of using snowball directly use your class?
Regards
Ismail
Thanks Ismail. That makes obvious sense. One other query I had was around the stop words. Is there a standard array of English stop words within Lucene.Net that I can use? Or will I need to generate my own array?
Mark,
You can get them directly from lucene.net standard analyser class its exposed as a property see here so you should be able todo
It returns CharArraySet so you may have to fiddle with that to get list string unless snowball will take CharArraySet
Hi Ismail,
I'm starting on the development work for this today, but have another question. The version of Lucene.Net in Umbraco 6.1.4 is 2.9.4 (file version 2.9.4.1). The only contrib version I can find is 3.0.3, which requires Lucene.Net 3.0.3.
Do you know if there will be any issues in updating the Lucene.Net dll to version 3.0.3? Or am I safer to try and find the 2.9.4 compatible version of contrib? Do you know where I can find it?
Thanks, Mark
Mark,
Any reason why you want latest version of lucene.net? I suspect Examine would probably break if you replaced the version of lucene.net its been built to.
Regards
Ismail
Hi Ismail,
That's what I was asking. I don't want to use the latest version of lucene (3.0.3), I wan't to use version 2.9.4 (used by Umbraco 6.1.4). The problem is, I can't find a downloadable verion of contrib 2.9.4. All I can find is contrib version 3.0.3, but that requires lucene.net 3.0.3. I'm trying to avoid the situation you're describing. :-)
The thing is, I haven't had cause to download lucene before, and I'm struggling to find it.
Regards, Mark
Eureka! I've found it on Nuget:
Install-Package Lucene.Net.Contrib -Version 2.9.4.1
For some reason I just can't seem to find what I'm looking for here:
http://lucenenet.apache.org/
I end up trawling through subversion web pages that don't provide the ability to download anything. I'm obviously missing something!
Anyway, thanks for all your help, Ismail
For anyone uncertain how to implement the Snowball Analyzer, this is what I did:
Note: Umbraco version 6.1.4. For other versions, I would check you are using the correct version of Lucene.Net.Contrib.
Created a new class library in Visual Studio called LuceneNetContrib.
Installed nuget package:
Added the following class to the class library:
Compiled the project and copied "Lucene.Net.Contrib.Snowball.dll" and "LuceneNetContrib.dll" to the Umbraco /bin/ folder.
Then, in Umbraco opened "/Config/ExamineSettings.config" and specified the analyzer. So:
Becomes:
Do the same thing for the searcher, so:
Becomes
Finally, rebuilt the index via the CMS (Developer section, Examine Management tab) and tested. All working as expected!
is working on a reply...