The issue is that results are erratic. One result appears for a 10km distance but not for 100km for instance, which obviously is wrong.
Some investigation took me to the point where it seems that Lucene 3.x spatial is deprecated and it doesn't work very well but Umbraco doesn't accept Lucene >= 3.0.0
Does anyone know if this is a known issue?. Any workarounds appreciated.
I have worked example based on this article for my examine course. One thing how are you indexing the items? Are they umbraco nodes or in the a database?
private void BuildGeoSpatialSearchTiers(ApplicationContext applicationContext)
{
List<CartesianTierPlotter> _ctps = new List<CartesianTierPlotter>();
IProjector _projector = new SinusoidalProjector();
CartesianTierPlotter ctp = new CartesianTierPlotter(0, _projector, CartesianTierPlotter.DefaltFieldPrefix);
int highestTier = ctp.BestFit(100);
int endTier = ctp.BestFit(1);
for (int i = highestTier; i <= endTier; i++)
{
_ctps.Add(new CartesianTierPlotter(i, _projector, CartesianTierPlotter.DefaltFieldPrefix));
}
if (applicationContext.IsConfigured && applicationContext.DatabaseContext.IsDatabaseConfigured)
{
//Not sure why this is done here and not in the actual ApplicationStarted, but I don't think it will make a difference.
var indexer = (LuceneIndexer)ExamineManager.Instance.IndexProviderCollection[SearchService.SearcherExternalIndex];
indexer.DocumentWriting += new EventHandler<DocumentWritingEventArgs>(Indexer_DocumentWriting);
}
}
And the indexing is happening here:
private void Indexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
{
List<CartesianTierPlotter> _ctps = new List<CartesianTierPlotter>();
if (e.Fields["nodeTypeAlias"] == Stockist.ModelTypeAlias)
{
var stockistContent = new UmbracoHelper(ContextHelpers.EnsureUmbracoContext()).TypedContent(e.Fields["id"]);
if (stockistContent == null)
return;
Stockist stockist = new Stockist(stockistContent); //Models builder model
if (stockist != null && stockist.AddressLookup != null)
{
var latitudeDecimal = stockist.AddressLookup.Latitude;
var longitudeDecimal = stockist.AddressLookup.Longitude;
double lat = Convert.ToDouble(latitudeDecimal);
double lng = Convert.ToDouble(longitudeDecimal);
//Add the longitude and latitude to the indexer
e.Document.Add(new Field("_lat", NumericUtils.DoubleToPrefixCoded(lat), Field.Store.YES, Field.Index.NOT_ANALYZED));
e.Document.Add(new Field("_long", NumericUtils.DoubleToPrefixCoded(lng), Field.Store.YES, Field.Index.NOT_ANALYZED));
//Loop through each of our tiers
for (int i = 0; i < _ctps.Count; i++)
{
CartesianTierPlotter ctp = _ctps[i];
var boxId = ctp.GetTierBoxId(lat, lng);
//Add the tier data to the indexer
e.Document.Add(new Field(ctp.GetTierFieldName(), NumericUtils.DoubleToPrefixCoded(boxId), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
}
}
}
}
It's seems that's all the code involved in the indexing.
I modified the code so it uses the miles conversion, I guess there was a unit error with that as it was supposed to be kms not miles.
Regardless that change, I have created a new node and it wasn't properly indexed, so it was appearing on 10km search but not on 100km. There must be definitely something wrong on Lucene Spatial (?).
Unfortunately reindexing each time a new node is entered is not an option as this client has 17k nodes to index, which I tried to do in my local and it took 6 hours.
And the results are not returned ordered by distance, that is done afterwards with just:
//distanceResults just a list of IPublishedContent
distanceResults.OrderBy(x => x.Distance).ToList();
So some of the results where let out.
To fix the issue I just changed the doc limit to 500 but that can cause same issue in the future. So the question now would be if there is a way of returning the top 50 results ordered by distance.
So they create a sorting filter. This is adapted to my implementation:
var distanceFilter = new LatLongDistanceFilter(boundingArea, radius, Lat, Long, "_lat", "_long");
DistanceFieldComparatorSource dsort = new DistanceFieldComparatorSource(distanceFilter);
//it seems that the 'geo_distance' field is created by the distance filter so there is not need to index it.
Sort sort = new Sort(new SortField("geo_distance", dsort));
//and the sort filter is passed to the query. These results are already sorted by distance.
TopDocs resultDocs = searcher.Search(masterQuery, null, 100, sort);
The way they do it in the example is a bit different:
//Instead creating a filter they create a distance query
var distanceQuery = new DistanceQueryBuilder(Lat, Long, Convert.ToDouble(radius), "_lat", "_long", CartesianTierPlotter.DefaltFieldPrefix, true).Filter;
DistanceFieldComparatorSource dsort = new DistanceFieldComparatorSource(dq.getDistanceFilter());
Sort sort = new Sort(new SortField("geo_distance", dsort));
Query query = new MatchAllDocsQuery();
// then they access the created filter from the query with getFilter()
TopDocs hits = searcher.search(query, dq.getFilter(), 20, sort);
Thanks for your help with this Ismail, you pointed me to the right direction :).
Erratic results with Lucene Spatial
I have inherited two sites from two different clients and both seem to have the same issue. They implement a spacial search based on this article:
https://www.leapinggorilla.com/Blog/Read/1010/spatial-search-in-lucenenet---worked-example
The issue is that results are erratic. One result appears for a 10km distance but not for 100km for instance, which obviously is wrong.
Some investigation took me to the point where it seems that Lucene 3.x spatial is deprecated and it doesn't work very well but Umbraco doesn't accept Lucene >= 3.0.0
Does anyone know if this is a known issue?. Any workarounds appreciated.
thanks.
Mario,
I have worked example based on this article for my examine course. One thing how are you indexing the items? Are they umbraco nodes or in the a database?
Regards
Ismail
Hi Ismail,
I put all the code that index the content together. They are Umbraco nodes:
Tiers are built here:
And the indexing is happening here:
It's seems that's all the code involved in the indexing.
Thanks.
Mario,
The locations that do not work are they newish locations? Were these ones added after the main index build?
I found with my setup if I added new nodes they would not appear however doing an index rebuild they would then appear.
Also with my implementation for the values:
I have the values from the original post so in my case:
and
Try the index rebuild if that dont work try updating the start and end tier values then restart then rebuild index see if that works.
Regards
Ismail
Thanks Ismail,
I modified the code so it uses the miles conversion, I guess there was a unit error with that as it was supposed to be kms not miles.
Regardless that change, I have created a new node and it wasn't properly indexed, so it was appearing on 10km search but not on 100km. There must be definitely something wrong on Lucene Spatial (?).
Unfortunately reindexing each time a new node is entered is not an option as this client has 17k nodes to index, which I tried to do in my local and it took 6 hours.
Mario,
What happened after you rebuilt index locally? Did it appear correctly? What about just republishing that one?
Regards
Ismail
So after building and rebuilding... it turned out that the issue was in the search and not in the indexing.
So the search was doing this:
And the results are not returned ordered by distance, that is done afterwards with just:
So some of the results where let out.
To fix the issue I just changed the doc limit to 500 but that can cause same issue in the future. So the question now would be if there is a way of returning the top 50 results ordered by distance.
Mario,
Distance is calculated so you cannot sort on until its calculated.
I am just looking at my code looks like im pulling back 100. I will have a play tommorow on this as i will have same issue as you.
Regards
Ismail
I found this article:
So they create a sorting filter. This is adapted to my implementation:
The way they do it in the example is a bit different:
Thanks for your help with this Ismail, you pointed me to the right direction :).
Mario,
So now that you are adding the sort when doing the search did you get rid of the linq sort?
Regards
Ismail
Mario,
Excellent find I have just used this update on my examine course spatial example, I know have lucene sort rather than nasty linq sort on distance.
Regards
Ismail
Cool, yep I removed the linq sort too.
is working on a reply...