Examine wildcard searches and no documentation or samples
Hi,
I am struggling to make sense of a few things without proper examples / docs and can't see anyone else having the same, so here goes.
I have an Umbraco API Controller I am trying to use for search, the action takes a "search term" that gets passed into a query.
var indexer = _examineManager.Indexes.FirstOrDefault(w => w.Name.Equals("ExternalIndex"));
if (indexer == null) return model;
var searcher = indexer.GetSearcher();
var resultsByNodeName = searcher.CreateQuery("content")
//.NodeTypeAlias("content").And().GroupedOr(_contentFieldsToSearch, q)
.NativeQuery(GetNativeLuceneQuery(q, string.Empty, _contentFieldsToSearch))
.Execute(limit) // max limit of results
.OrderByDescending(o => o.Score);
What I am trying to achieve?
Searching for "well" or "wellbeing", in the Backoffice Examine Manager returns 3 results, BUT my API is not finding ANY results to match (apart for other odd results).
First solution was to use "plain":
using
.NodeTypeAlias("content").And().GroupedOr(_contentFieldsToSearch, q)
to create and execute query, however that does not deal with "wildcard" matches, etc. so is useless for me at this stage.
private string GetNativeLuceneQuery(string query, string type, string[] fields)
{
if (string.IsNullOrWhiteSpace(query)) return string.Empty;
if (string.IsNullOrWhiteSpace(query.Trim(new[] { '\"', '\'' }))) return string.Empty;
//build a lucene query:
// the __nodeName will be boosted 10x without wildcards
// then __nodeName will be matched normally with wildcards
// the rest will be normal without wildcards
var nativeQuery = string.Empty;
//check if text is surrounded by single or double quotes, if so, then exact match
var surroundedByQuotes = Regex.IsMatch(query, "^\".*?\"$") || Regex.IsMatch(query, "^\'.*?\'$");
if (surroundedByQuotes)
{
//strip quotes, escape string, the replace again
query = query.Trim(new[] { '\"', '\'' });
query = Lucene.Net.QueryParsers.QueryParser.Escape(query);
if (string.IsNullOrWhiteSpace(query)) return string.Empty;
//add back the surrounding quotes
query = string.Format("{0}{1}{0}", "\"", query);
//node name exactly boost x 10
nativeQuery += $"+(__nodeName: ({query.ToLower()})^10.0 ";
//additional fields normally
nativeQuery += fields.Aggregate(nativeQuery, (current, f) => current + $"{f}: ({query}) ");
}
else
{
query = Lucene.Net.QueryParsers.QueryParser.Escape(query);
var querywords = query.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
//node name exactly boost x 10
nativeQuery += $"+(__nodeName:\"{query.ToLower()}\"^10.0 ";
//node name normally with wildcards
nativeQuery += " __nodeName:(";
nativeQuery = querywords.Aggregate(nativeQuery, (current, w) => current + $"{w.ToLower()}* ").Trim();
nativeQuery += ") ";
foreach (var f in fields)
{
//additional fields normally
nativeQuery += $"{f}:(";
nativeQuery = querywords.Aggregate(nativeQuery, (current, w) => current + $"{w.ToLower()}* ").Trim();
nativeQuery += $") ";
}
}
nativeQuery += string.IsNullOrEmpty(type) ? $") +__IndexType:content" : $") +__IndexType:{type}";
nativeQuery += $" +excludeFromSearch:(0) "; // only search if not excluded from search
return nativeQuery;
}
API returns 0 for both "well" or "wellbeing", Lucene queries as follows:
Absurd is that if I search Backoffice for "abs", expecting to get at least 1 document, with node name of "Absence / Leave Information", I get NONE results, while the API returns what I am expecting to get...
Examine wildcard searches and no documentation or samples
Hi,
I am struggling to make sense of a few things without proper examples / docs and can't see anyone else having the same, so here goes.
I have an Umbraco API Controller I am trying to use for search, the action takes a "search term" that gets passed into a query.
What I am trying to achieve? Searching for "well" or "wellbeing", in the Backoffice Examine Manager returns 3 results, BUT my API is not finding ANY results to match (apart for other odd results).
First solution was to use "plain": using
.NodeTypeAlias("content").And().GroupedOr(_contentFieldsToSearch, q)
to create and execute query, however that does not deal with "wildcard" matches, etc. so is useless for me at this stage.Second solution (as in this sample):
I have written a little helper GetNativeLuceneQuery to try to "reason" as to why on earth does this not return the same exact set of results as the interface in the Backoffice; based off https://github.com/umbraco/Umbraco-CMS/blob/b260f6c6dedc6c8c123de5f2348b4d7f0dcb51bd/src/Umbraco.Web/Editors/EntityController.cs#L244
API returns 0 for both "well" or "wellbeing", Lucene queries as follows:
Absurd is that if I search Backoffice for "abs", expecting to get at least 1 document, with node name of "Absence / Leave Information", I get NONE results, while the API returns what I am expecting to get...
Please help.
I've just came across this and it solved my "searching" needs.
https://our.umbraco.com/forum/umbraco-8/97707-examine-leading-wildcard#comment-309422
is working on a reply...