Raw Lucene Date Range Query not working in Examine (Umbraco 8)
For Examine experts!
Have a strange issue with Examine in Umbraco 8. If I use the fluent API to create a RangeQuery using dates then it works, but if I execute the same query as a raw query then it doesn't. To provide an example:
if (ExamineManager.Instance.TryGetIndex(global::Umbraco.Core.Constants.UmbracoIndexes.ExternalIndexName, out var index))
{
var searcher = index.GetSearcher();
var query = searcher.CreateQuery("content").RangeQuery<DateTime>(new string[] { "createDate" }, DateTime.Now.AddDays(-37), DateTime.Now);
var results = query.Execute();
// results.TotalItemCount = 2;
}
So the above query returns 2 results, which is correct. It just returns items with a create date between the two specified dates, as expected.
However, if I take the raw Lucene query from the above code (eg. via query.ToString()) I get this:
+(createDate:[637109872908600000 TO 637141840908600000])
(Note the date ticks values will change depending on what the current date is - this is just an example).
Now if I run this in the back-office Examine Management dashboard it returns every node, even if I amend it just to search content. So my query would actually be:
+__IndexType:content +createDate:[637109872908600000 TO 637141840908600000]
This still returns every content node in the site.
In fact, even this query below returns every node when I would expect it to return none (as the 'from' date is after the 'to' date).
+__IndexType:content +createDate:[999999999999999999 TO 637143565034990000]
I've run these queries in the back office Examine Management dashboard and also via Luke.Net to same effect. It's like the range query is ignored.
So my question is why does the date range query work when constructed via the fluent API, but not if constructed manually as a raw query?
Is there any way to get the raw query to work? (I'm building a rather complex query via a raw Lucence query and it works fine apart from this date range part). Note I'm searching the built-in createDate field, not a custom field, and I can see the date is being stored as ticks in the index. I'm using the Standard Analyser, too.
I don't really have an answer, but looking at 2 different indexes, one from v7 (which I just ran a date range query on in Like fine) and one from v8 (which I was also unable to run a date range query in Luke (no results)) it looks like the fields are indexed differently now so I wonder if that has something to do with it.
In v7, Luke shows the createDate flags as ISV but in v8 they are now ITSf0
I'm no expert in this stuff, but it could be a clue.
The person you really want to get involved in Ismail 😁
I'm also guessing the Fluent API might also capture some metadata about the field type and maybe pass that into the query in some other way so it's not immediately obvious by ToString-ing the query.
Thanks, Matt! Yeah, dates are definitely stored differently in 8 than 7. In 8 they are stored as rounded ticks - which is essentially just a long integer. So in essence the range query just treats them as a numeric range, which you would have thought would be relatively simple.... You can see how Shannon does it in the Examine source code.
Like you say, maybe Shannon does something else to make this work beyond what is visible in the raw query. I would ping Ismail, but don't like singling out people - I'm sure he gets enough hassle with Examine :) But thanks for checking, I appreciate it.
It's because lucene query parser parses ranges ONLY as string ranges not as numerical or date ranges. This isn't a bug of Examine per se but one of Lucene, though I know how to work around it.
Thanks, Shannon, I was wondering if it was something like that.
How do you work around it in Examine, then? I looked through the source code but couldn't really pick up anything. Presumably at some point your API has to turn everything into a raw Lucene query, so how do you get it to treat a range as numeric? Thanks!
Presumably at some point your API has to turn everything into a raw Lucene query, so how do you get it to treat a range as numeric
Actually it's the reverse of this :) Sure Lucene can work by passing in a string Lucene Query but Lucene actually works by using objects to create a query. Examine creates these query objects directly and doesn't build up a string query. When you pass in a string query to Lucene it uses a QueryParser to break that string down into objects.
Thanks, Shannon! Makes sense about the raw query part; I'd just never considered that. Will dig into it and see what I can do! Great to have Examine abstract some of these issues away :) Thanks.
Raw Lucene Date Range Query not working in Examine (Umbraco 8)
For Examine experts!
Have a strange issue with Examine in Umbraco 8. If I use the fluent API to create a RangeQuery using dates then it works, but if I execute the same query as a raw query then it doesn't. To provide an example:
So the above query returns 2 results, which is correct. It just returns items with a create date between the two specified dates, as expected.
However, if I take the raw Lucene query from the above code (eg.
via query.ToString()
) I get this:(Note the date ticks values will change depending on what the current date is - this is just an example).
Now if I run this in the back-office Examine Management dashboard it returns every node, even if I amend it just to search content. So my query would actually be:
This still returns every content node in the site.
In fact, even this query below returns every node when I would expect it to return none (as the 'from' date is after the 'to' date).
I've run these queries in the back office Examine Management dashboard and also via Luke.Net to same effect. It's like the range query is ignored.
So my question is why does the date range query work when constructed via the fluent API, but not if constructed manually as a raw query?
Is there any way to get the raw query to work? (I'm building a rather complex query via a raw Lucence query and it works fine apart from this date range part). Note I'm searching the built-in
createDate
field, not a custom field, and I can see the date is being stored as ticks in the index. I'm using the Standard Analyser, too.I don't really have an answer, but looking at 2 different indexes, one from v7 (which I just ran a date range query on in Like fine) and one from v8 (which I was also unable to run a date range query in Luke (no results)) it looks like the fields are indexed differently now so I wonder if that has something to do with it.
In v7, Luke shows the createDate flags as ISV but in v8 they are now ITSf0
I'm no expert in this stuff, but it could be a clue.
The person you really want to get involved in Ismail 😁
I'm also guessing the Fluent API might also capture some metadata about the field type and maybe pass that into the query in some other way so it's not immediately obvious by ToString-ing the query.
Thanks, Matt! Yeah, dates are definitely stored differently in 8 than 7. In 8 they are stored as rounded ticks - which is essentially just a long integer. So in essence the range query just treats them as a numeric range, which you would have thought would be relatively simple.... You can see how Shannon does it in the Examine source code.
Like you say, maybe Shannon does something else to make this work beyond what is visible in the raw query. I would ping Ismail, but don't like singling out people - I'm sure he gets enough hassle with Examine :) But thanks for checking, I appreciate it.
It's because lucene query parser parses ranges ONLY as string ranges not as numerical or date ranges. This isn't a bug of Examine per se but one of Lucene, though I know how to work around it.
You can follow the issue here https://github.com/Shazwazza/Examine/issues/133
Thanks, Shannon, I was wondering if it was something like that.
How do you work around it in Examine, then? I looked through the source code but couldn't really pick up anything. Presumably at some point your API has to turn everything into a raw Lucene query, so how do you get it to treat a range as numeric? Thanks!
Hey Dan,
I already have the fix in for Examine locally, just haven't pushed it yet. But it goes here: https://github.com/Shazwazza/Examine/blob/master/src/Examine/LuceneEngine/Search/CustomMultiFieldQueryParser.cs
Actually it's the reverse of this :) Sure Lucene can work by passing in a string Lucene Query but Lucene actually works by using objects to create a query. Examine creates these query objects directly and doesn't build up a string query. When you pass in a string query to Lucene it uses a QueryParser to break that string down into objects.
You can see the Lucene method here for converting a Range query string to a RangeQuery object https://github.com/apache/lucenenet/blob/3.0.3/src/core/QueryParser/QueryParser.cs#L743
Which tries to parse dates but not numbers, it then ends up calling NewRangeQuery: https://github.com/apache/lucenenet/blob/3.0.3/src/core/QueryParser/QueryParser.cs#L891
which you can see just returns a
TermRangeQuery
, it doesn't try to check for numbers or anything which would require aNumericRangeQuery
So the fix is to overrdie
NewRangeQuery
and detect values.Thanks, Shannon! Makes sense about the raw query part; I'd just never considered that. Will dig into it and see what I can do! Great to have Examine abstract some of these issues away :) Thanks.
The fix i have locally isn't complete though, so I've marked the issue as
help-wanted
(up for grabs) if anyone wants to take a stab at itis working on a reply...