Search multiple fields for multiple terms with examine
I'm trying to set up a search, that searches multiple fields for multiple terms with Examine on a Umbraco 4.7.0 install. I've started out using some code I've used before on a 4.5.2 site. Which basically looks like this:
public static List<Cupcake> SearchCupcakes(string searchTerm) { var criteria = ExamineManager.Instance.SearchProviderCollection["CupcakeSearcher"].CreateSearchCriteria(BooleanOperation.Or); string[] terms; terms = searchTerm.Contains(" ") ? searchTerm.Split(' ') : new[] {searchTerm}; fields = new[] {"id", "firstName", "lastName", "nodeName"}; var query = criteria.GroupedOr(fields, terms); var results = ExamineManager.Instance.SearchProviderCollection["CupcakeSearcher"].Search(query.Compile()); if (results.Any()) return results.Select(r => new Cupcake(r)).ToList(); return null; }
Problem is I can't get the criteria to create the query when there's more than one searchterm (ie. searchTerm string is "Super Man" for example). Code always throws an "IndexOutOfRangeException: Index was outside the bounds of the array" when searchTerm is split into multiple strings.
Problem is also there when I try the overload for GroupedOr taking an IExamineValue[] as second parameter.
Feels like some kind of a bug, since similar code worked earlier, and problem goes away when the array contains just a single item.
I have had a smiliar issue with examine and as far as I remember it seemed like there was some kind of mismatch between the fields and terms in the GroupedOr method. One option would be to use multiple GroupedOr, but it would be a bit clumsy I guess.
Have you tried using IExamineValue[] instead of string[] as the second parameter in the GroupedOr - might be something there? I'm using new[] { searchTerm.MultipleCharacterWildcard() }) as the second parameter for an autocomplete search, which works fine.
Do you have a stack trace for the "IndexOutOfRangeException: Index was outside the bounds of the array"-error? Be interesting to see there the error occurs, because the overload you are using just calls the overload I was refering to in my previous post, which calls the internal GroupedOrInternal method.
... and I just noted this comment in the code:
protected internal IBooleanOperation GroupedOrInternal(string[] fields, IExamineValue[] fieldVals, BooleanClause.Occur occurance) { //if there's only 1 query text we want to build up a string like this: //(field1:query field2:query field3:query) //but Lucene will bork if you provide an array of length 1 (which is != to the field length)
/// <summary> /// Creates our own style 'multi field query' used internal for the grouped operations /// </summary> /// <param name="fields"></param> /// <param name="fieldVals"></param> /// <param name="occurance"></param> /// <returns>A new <see cref="Examine.SearchCriteria.IBooleanOperation"/> with the clause appended</returns> protected internal BooleanQuery GetMultiFieldQuery(string[] fields, IExamineValue[] fieldVals, BooleanClause.Occur occurance) { //if there's only 1 query text we want to build up a string like this: //(!field1:query !field2:query !field3:query) //but Lucene will bork if you provide an array of length 1 (which is != to the field length)
var queryVals = new IExamineValue[fields.Length]; if (fieldVals.Length == 1) { for (int i = 0; i < queryVals.Length; i++) queryVals[i] = fieldVals[0]; } else { queryVals = fieldVals; }
var qry = new BooleanQuery(); for (int i = 0; i < fields.Length; i++) { qry.Add(this.GetFieldInternalQuery(fields[i], queryVals[i], true), occurance); }
return qry; }
That looks like its being split up, so the method fails if number of search terms isn't equal to number of fields.
Thanks for your reply, as stated I did try using IExamineValue[], didn't work either, and it throws the same exception.
I'm using new[] { LuceneSearchExtensions.MultipleCharacterWildcard(searchTerm) }, since string.MultipleCharacterWildcard() is marked as obsolete in the version I have. But as far as I can tell, that searches for the entire string including spaces, with a * appended to the end, thus not finding anything.
For future reference, I ended up using the "multiple grouped or", which gave me an opportunity to revisit a code structure I haven't used for a long time:
public static List<Cupcake> SearchCupcakes(string searchTerm) { var criteria = ExamineManager.Instance.SearchProviderCollection["CupcakeSearcher"].CreateSearchCriteria(BooleanOperation.Or); string[] terms; var query = GetFilter(searchTerm, criteria); var results = ExamineManager.Instance.SearchProviderCollection["CupcakeSearcher"].Search(query.Compile()); if (results.Any()) return results.Select(r => new Cupcake(r)).ToList(); return null; }
private static IBooleanOperation GetFilter(string searchTerm, ISearchCriteria criteria) { // If searching for a number we will search for an ID int cakeId; if (int.TryParse(searchTerm.Trim(), out cakeId)) { return criteria.Field("id", searchTerm); } // Handle searching for multiple terms var textFields = new[] {"firstName", "lastName", "nodeName"}; if (searchTerm.Contains(" ")) { string[] terms = searchTerm.Split(' '); var result = criteria.GroupedOr(textFields, new[] {LuceneSearchExtensions.MultipleCharacterWildcard(terms[0])}); for (int i = 1; i < terms.Length; i++) { result.Or().GroupedOr(textFields, new[] {LuceneSearchExtensions.MultipleCharacterWildcard(terms[i])}); } } // Handle searching for single term return criteria.GroupedOr(textFields, new IExamineValue[] { LuceneSearchExtensions.MultipleCharacterWildcard(searchTerm) }); }
Yea, there does seem to be something it that method. Not sure I understand it though. I think I discussed it with Aaron or Shannon once, but can't seem to find it. Might have been on twitter :-S
But I think I found a snippet I have used for multiple search terms - try if this doesn't give you the expected result:
var searchCriteria = ExamineManager.Instance.SearchProviderCollection["CupcakeSearcher"] .CreateSearchCriteria(IndexTypes.Content);
var fields =new[]{"id","firstName","lastName","nodeName"}; var term = new IExamineValue[] {searchTerm.ToLower().Escape()};
var query = searchCriteria.GroupedOr(fields, term).Compile(); var results =ExamineManager.Instance.SearchProviderCollection["CupcakeSearcher"].Search(query);
You are probably right. In the spur of the moment I thought to remember that the .Escape() extension method would break up the string into multiple values, but I remembered wrong, as it only escapes special charactors and has nothing to do with splitting the string into an array of IExamineValue's.
Sorry for the confusion.
I'm sure I will have the same GroupedOr problem with a solution thats in testing at the moment :) so I will check back to this thread with my findings.
I used your code as inspiration / a departure point, and came up with a submethod to build a query that searches for multiple terms across multiple fields *and* allows the user to search for quoted substrings (ie: "Hamlet "Twelfth Night" Macbeth" searches "Hamlet"; "Twelfth Night"; "Macbeth")
My problem now is that I don't know how to handle "common" words. Eg a search for "the scottish play" returns no results because "the" becomes a required word in the search, and Lucene seems t ignore it in the index. If anyone has pointers on that, I'm all ears.
if ((searchString.Contains(@"""")) && (searchString.Count(t => t == '"') % 2 == 0)) // even number of quotes, more than zero { Regex quoteRegex = new Regex(@""".+?"""); // look for any content between quotes foreach (Match item in quoteRegex.Matches(searchString)) { Terms.Add(new SearchTerm() { Term = item.Value.Replace('"',' ').Trim(), TermType = SearchTermType.MultiWord }); searchString = Regex.Replace(searchString, item.Value, string.Empty); // remove them from search string for subsequent parsing } }
I think we were initially misunderstanding how GroupedOr is supposed to be used - I know at least I was. In the source of Examine there is a test project with a lot of useful unit tests, which gives a good understanding of how to work with the API.
Consider these two snippets from the unit testing project:
publicvoid FluentApi_Split_Search_Term() { var searchTerm = "Billy Bob";
var criteria = _searcher.CreateSearchCriteria(); IQuery qry = qry = criteria.GroupedOr(new[] { "PageTitle", "PageContent", "nodeName" }, searchTerm).Or(); foreach (var t in searchTerm.Split(' ')) {
var sdaf = qry.Field(UmbracoContentIndexer.IndexTypeFieldName, IndexTypes.Content).Compile();
var results = _searcher.Search(sdaf); }
So this is a search for two words "Billy" and "Bob" in three fields "PageTitle", "PageContent" and "nodeName". Notice how the query is built (which is pretty much how Jesper also did it his latest post - although the above is a little simpler) - one searchterm per GroupedOr, and then consider this snippet and comments:
publicvoid FluentApiTests_Grouped_Or_Examiness() { ////Arrange var criteria = _searcher.CreateSearchCriteria(IndexTypes.Content);
//get all node type aliases starting with CWS_Home OR and all nodees starting with "About" var filter = criteria.GroupedOr( new[] { "nodeTypeAlias", "nodeName" }, new[] { "CWS\\_Home".Boost(10), "About".MultipleCharacterWildcard() }) .Compile();
So using multiple terms with multiple fields implies that you search term1 in field1 OR term2 in field2, which also explains Jespers index out of range error.
With regards to my snippet using .Escape() Jonathan is right that it would not work in Jespers scenario, but it does in my scenario because I wanted the results with the two words in sequence. I should have considered this when adding the snippet.
Hope this helps others understand how to use part of the FluentAPI.
Awesome work, it's both inspiring and making me feel like a lazy sob, since I didn't go to the test code to check it out. :-P
I agree that the GroupedOr(string[], string[]) method seems to be designed for searching pairs of nodetypealiases and terms. And the more I think about it, the more convinced I am that it probably has to be that way. I'm not sure it would even be possible to make a method a la MultipleCharacterWildcard that could handle the searchterm the way we were looking for.
Perhaps the GroupedOr method would be bit more intuitive though if it took something like a Dictionary<stringAlias, stringTerm>, instead of the two independent arrays to indicate that the alias and terms are linked together the way they are.
Thanks Jonathan! Your code works better (ie. the most google-like) than any others I could find (I am new to Lucene and still don't understand anything, so I can't write my own code yet)
Since we are using GroupedOr to search multiple fields, I don't think there is any easy way to boost nodeName, is there?
And perhaps we can manually remove common words from searchString in the BuildQuery function?
So, what is the latest word on this? I see discussion but no final outcome of how to get the desired results.
I am trying to search multiple fields with multiple words. If I pass in "Phoenix Divorce" as my search keyword, I want to get all instances where the record contains "Phoenix" and "Divorce", no matter where or in what field. Right now all I get are results for the exact string of "Phoenix Divorce".
I've concluded that GroupedOr is broken for multiple terms. It's clear when looking at the dissassembly for GetMultiFieldQuery that there's a fault in the logic. Think about it- You can ALWAYS avoid an index out of range error because the length of an array is fixed in memory.
It's unfortunate that this doesn't get more attention from the development team as it's clearly a huge bug in a very common style of searching.
I've found the following work-around (similar to those above), which is to create separate calls per term (the new[] should be moved to a local or member, but you get the idea):
Not sure if this helps but when I use examine i always implement gathering_node data event and inject in a new field called contents which is a collection of all the fields content. I always search on this. If I need to do anything a bit specific like date range search or any other filter I then search on those fields. So instead of having to pass in array of fields and terms its just the one field and any number of terms.
I had a similar issue and this thread pointed me in the right direction.
However I had a more complex scenario where I may already have Or and And criteria specified. The solution for me was just to create an array of identically named search fields.. eg
Search multiple fields for multiple terms with examine
I'm trying to set up a search, that searches multiple fields for multiple terms with Examine on a Umbraco 4.7.0 install. I've started out using some code I've used before on a 4.5.2 site. Which basically looks like this:
Problem is I can't get the criteria to create the query when there's more than one searchterm (ie. searchTerm string is "Super Man" for example). Code always throws an "IndexOutOfRangeException: Index was outside the bounds of the array" when searchTerm is split into multiple strings.
Problem is also there when I try the overload for GroupedOr taking an IExamineValue[] as second parameter.
Feels like some kind of a bug, since similar code worked earlier, and problem goes away when the array contains just a single item.
Any suggestions?
Regards
Jesper Hauge
Hi Jesper,
I have had a smiliar issue with examine and as far as I remember it seemed like there was some kind of mismatch between the fields and terms in the GroupedOr method. One option would be to use multiple GroupedOr, but it would be a bit clumsy I guess.
Have you tried using IExamineValue[] instead of string[] as the second parameter in the GroupedOr - might be something there?
I'm using new[] { searchTerm.MultipleCharacterWildcard() }) as the second parameter for an autocomplete search, which works fine.
- Morten
Do you have a stack trace for the "IndexOutOfRangeException: Index was outside the bounds of the array"-error? Be interesting to see there the error occurs, because the overload you are using just calls the overload I was refering to in my previous post, which calls the internal GroupedOrInternal method.
... and I just noted this comment in the code:
and the GetMultiFieldQuery method looks like this
That looks like its being split up, so the method fails if number of search terms isn't equal to number of fields.
- Morten
Hi Morten,
Thanks for your reply, as stated I did try using IExamineValue[], didn't work either, and it throws the same exception.
I'm using new[] { LuceneSearchExtensions.MultipleCharacterWildcard(searchTerm) }, since string.MultipleCharacterWildcard() is marked as obsolete in the version I have. But as far as I can tell, that searches for the entire string including spaces, with a * appended to the end, thus not finding anything.
.Jesper
Stack trace looks like this:
So it looks like you're right. If I can find the time, I'll download the source see if I can fix the code and creat a workitem, if I succeed.
.Jesper
For future reference, I ended up using the "multiple grouped or", which gave me an opportunity to revisit a code structure I haven't used for a long time:
.Jesper
Yea, there does seem to be something it that method. Not sure I understand it though. I think I discussed it with Aaron or Shannon once, but can't seem to find it. Might have been on twitter :-S
But I think I found a snippet I have used for multiple search terms - try if this doesn't give you the expected result:
- Morten
Hi Morten,
Thanks for this, but unfortunately it doesn't work. Precisely because it only contains a single value in the array of query terms. Change:
To
And you'll see the problem
J
You are probably right. In the spur of the moment I thought to remember that the .Escape() extension method would break up the string into multiple values, but I remembered wrong, as it only escapes special charactors and has nothing to do with splitting the string into an array of IExamineValue's.
Sorry for the confusion.
I'm sure I will have the same GroupedOr problem with a solution thats in testing at the moment :) so I will check back to this thread with my findings.
- Morten
Hey Jesper
I used your code as inspiration / a departure point, and came up with a submethod to build a query that searches for multiple terms across multiple fields *and* allows the user to search for quoted substrings (ie: "Hamlet "Twelfth Night" Macbeth" searches "Hamlet"; "Twelfth Night"; "Macbeth")
My problem now is that I don't know how to handle "common" words. Eg a search for "the scottish play" returns no results because "the" becomes a required word in the search, and Lucene seems t ignore it in the index. If anyone has pointers on that, I'm all ears.
If anyone can use, here it is:
And then 2 small model classes (SearchTerm and SearchTermType):
Hi Guys,
I think we were initially misunderstanding how GroupedOr is supposed to be used - I know at least I was. In the source of Examine there is a test project with a lot of useful unit tests, which gives a good understanding of how to work with the API.
Consider these two snippets from the unit testing project:
So this is a search for two words "Billy" and "Bob" in three fields "PageTitle", "PageContent" and "nodeName". Notice how the query is built (which is pretty much how Jesper also did it his latest post - although the above is a little simpler) - one searchterm per GroupedOr, and then consider this snippet and comments:
So using multiple terms with multiple fields implies that you search term1 in field1 OR term2 in field2, which also explains Jespers index out of range error.
With regards to my snippet using .Escape() Jonathan is right that it would not work in Jespers scenario, but it does in my scenario because I wanted the results with the two words in sequence. I should have considered this when adding the snippet.
Hope this helps others understand how to use part of the FluentAPI.
- Morten
Hi Morten and Jonathan,
Awesome work, it's both inspiring and making me feel like a lazy sob, since I didn't go to the test code to check it out. :-P
I agree that the GroupedOr(string[], string[]) method seems to be designed for searching pairs of nodetypealiases and terms. And the more I think about it, the more convinced I am that it probably has to be that way. I'm not sure it would even be possible to make a method a la MultipleCharacterWildcard that could handle the searchterm the way we were looking for.
Perhaps the GroupedOr method would be bit more intuitive though if it took something like a Dictionary<stringAlias, stringTerm>, instead of the two independent arrays to indicate that the alias and terms are linked together the way they are.
Thanks for your time and effort.
Regards
Jesper Hauge
> I think we were initially misunderstanding how GroupedOr is supposed to be used
I'm just copying you guys... I assumed you knew what was what :)
Please don't tell me you're still figuring it out, too ;)
J
Thanks Jonathan! Your code works better (ie. the most google-like) than any others I could find (I am new to Lucene and still don't understand anything, so I can't write my own code yet)
Since we are using GroupedOr to search multiple fields, I don't think there is any easy way to boost nodeName, is there?
And perhaps we can manually remove common words from searchString in the BuildQuery function?
So, what is the latest word on this? I see discussion but no final outcome of how to get the desired results.
I am trying to search multiple fields with multiple words. If I pass in "Phoenix Divorce" as my search keyword, I want to get all instances where the record contains "Phoenix" and "Divorce", no matter where or in what field. Right now all I get are results for the exact string of "Phoenix Divorce".
Why not do it like this?
Thanks Kenneth Solberg, your solution works perfect when you want to search multiple words in multiple fields using AND operator.
I've concluded that GroupedOr is broken for multiple terms. It's clear when looking at the dissassembly for GetMultiFieldQuery that there's a fault in the logic. Think about it- You can ALWAYS avoid an index out of range error because the length of an array is fixed in memory.
It's unfortunate that this doesn't get more attention from the development team as it's clearly a huge bug in a very common style of searching.
I've found the following work-around (similar to those above), which is to create separate calls per term (the new[] should be moved to a local or member, but you get the idea):
Hi Chris,
If you believe there is a bug in GroupedOr in Examine then the best to do is to report it on the projects codeplex page:
http://examine.codeplex.com/
- Morten
Guys,
Not sure if this helps but when I use examine i always implement gathering_node data event and inject in a new field called contents which is a collection of all the fields content. I always search on this. If I need to do anything a bit specific like date range search or any other filter I then search on those fields. So instead of having to pass in array of fields and terms its just the one field and any number of terms.
The munge field code in event looks like this
private void AddToContentsField(IndexingNodeDataEventArgs e)
{
var fields = e.Fields;
var combinedFields = new StringBuilder();
foreach (var keyValuePair in fields)
{
combinedFields.AppendLine(keyValuePair.Value);
}
e.Fields.Add("contents", combinedFields.ToString());
}
and the search code looks like
/// <summary>
/// performs quick lucene search
/// </summary>
/// <param name="filter">dictionary of query parameters where key maps to field alias in index</param>
/// <param name="negativeIdFilter">list of ids you want to ignore from search useful for news searches where stuff has already been picked</param>
/// <param name="nodeTypeAlias"></param>
/// <param name="SortBy"></param>
/// <param name="noToTake"></param>
/// <param name="queryDebug">returned query generated by examine good for debug stick it in hidden field</param>
/// <returns></returns>
public static List<SearchResult> PerformSearch(Dictionary<string,string> filter,
List<String> negativeIdFilter,
string nodeTypeAlias,
string SortBy,
int noToTake,
out string queryDebug){
var criteria = _searcher.CreateSearchCriteria(IndexTypes.Content);
IBooleanOperation query = criteria.NodeTypeAlias(nodeTypeAlias);
query = GetComplexQuery(query,filter);
foreach (var id in negativeIdFilter){
query = query.Not().Field("id", id);
}
if(SortBy!=string.Empty){
query = query.And().OrderByDescending(SortBy);
}
var searchResults = _searcher.Search(query.Compile());
queryDebug = criteria.ToString();
var displayResults = searchResults.Take(noToTake).ToList();
return displayResults;
}
/// <summary>
/// builds examine and query
/// </summary>
/// <param name="queryToBuild"></param>
/// <param name="filter"></param>
/// <returns></returns>
private static IBooleanOperation GetComplexQuery(IBooleanOperation queryToBuild,Dictionary<string,string> filter){
foreach (string key in filter.Keys){
string qsValue = filter[key];
if(qsValue.Contains(" "))
{
string[] terms = qsValue.Split(' ');
int termCount = 0;
foreach (var term in terms)
{
if(termCount==0){
queryToBuild = queryToBuild.And().Field(key, term);
}
else{
queryToBuild = queryToBuild.Or().Field(key, term);
}
termCount++;
}
}
else
{
if(qsValue.Contains("-")){
queryToBuild = queryToBuild.And().Field(key, qsValue.Escape());
}
else{
queryToBuild = queryToBuild.And().Field(key, qsValue.MultipleCharacterWildcard());
}
}
}
return queryToBuild;
}
Regards
Ismail
Sorry just realised no groupor but i have some code somewhere else
queryToBuild.And().GroupedOr(contentsFields, postcodesList.ToArray());
Regards
Ismail
Has anyone found a solution to this ??
HI,
I had a similar issue and this thread pointed me in the right direction.
However I had a more complex scenario where I may already have Or and And criteria specified. The solution for me was just to create an array of identically named search fields.. eg
Worked for me, might help someone else too!
Hi Kenneth,
Your below code will not work for multiple search term with or operator :
var words = searchTerm.Split(' '); var fields = new string[] { "nodeName", "someProperty", "anotherProperty" }; Examine.SearchCriteria.IBooleanOperation filter = null; foreach (var word in words) { if (filter == null) { filter = criteria.GroupedOr(fields, word); } else { filter = filter.And().GroupedOr(fields, word); } }
For anyone stumbling upon this... you might want to check out my SO here.
We are not constrained to the Examine query API which is poor at handling complex query building involving nested boolean logic.
You might want to check out this API which can build the Lucene query and you can pass that to Examine Raw.
https://stackoverflow.com/questions/44913836/umbraco-examine-nested-multiple-boolean-operators/44914535#44914535
https://github.com/lelandrichardson/lucene-fluent-query-builder
is working on a reply...