Thanks to Ismail's answer on StackOverflow regarding searching Lucene with spatial queries (Geosearching).
However I'm brand new to Lucene, so I've ordered Lucene in Action in order to learn more about the underlying technology and how it works.
I've rigged up a simple search, using the walkthrough on Umbraco.TV with the hopes of extending this to integrate Spatial.net - however on the surface the way it's performing this search is completely different to the way the Spatial.net addition for Luene.net works.
Does anyone have some hands on experience on creating a datatype in Umbraco that stores spatial data which allows it to be displayed in the front/backend, whilst allowing Lucene to search it using the Spatial.net package?
If not it looks like I'm going to have my hands dirty with some heavy reading! :)
I'm no expert in spatial data, (I haven't looked at Spatial.net yet either) - but I'm curious how it could work with Umbraco - defintely worth exploring.
From the data-types perspective, do you know if it's possible to serialize spatial data? e.g. "LINESTRING" & "POLYGON" co-ords? (PostGIS, GML2/GML3 - I googled these)
I googled around about this and found the SqlGeometryBuilder and SqlGeometry objects. If that didn't cause much overhead, then would be possible to store serialized data as Ntext, with the data-type's render control to deseralize it.
In the end I decided to store the data in a dedicated table in SQL Server 2008, configured to store spatial data. This way I can easily use the powerful spatial search features built in.
I've also rigged up an event handlers so that whenever a document is added/saved/deleted it'll automatically extract the spatial data and pass into the spatial table in SQL Server.
Found this test class in spatial.net which covers how to index and search using spatial https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_1/contrib/Spatial.Net/Tests/TestCartesian.cs. ; So you could use the googlemaps datatype to retrieve the geo codes for you and store those in the document in umbraco. Then on publish tap into gatheringnode event of examine and then index according to the code in the test class that will give you data in the index so that you can filter and sort on it.
I don't know if anyone is still looking at this but I'd rather not post a new forum thread if this lies unsolved.
I'm looking to implement spatial searching using lucence to plug in to a existing Examine based search routine. I've already built a gather event to index node realtionship data into our index and we already have Google providing lat lang data that is stored against our nodes.
But taking it from a java context to something we'd write in a gathering event and indeed what the eventual raw luecne query would look like is still proving tricky.
The only other issue I can see is that to take advantage of Lucence.net spatial https://nuget.org/packages/Lucene.Net.Contrib.Spatial is that I'll have to recompile Umbraco Examine with the latest version of Lucene.net
Had to down tools on this problem to concentrate on the gathering method for geolocation. But in the intervining time it would see that Gary H over at Leaping Gorilla has given us a tantalising walkthough of putting this into action.
Along with a pretty descriptive walkthough he also helped with one of the more basic steps I had struggled with, just pulling in all the necessary librarys. For those also looking the following NuGet will solve that little problem.
My remaining head scratch is around how to implement this within a ExamineManager GatheringNodeData event.
It's easy enough to grab the fields containing our Lat and Long and create our Cartisan Plotter. But once the plotter data is created whats the best way to pass it into IndexingNodeDataEventArgs?
Excellent find on that article. With regards to getting it into the index don't use gatheringnode data you need lower level lucene access use document writing event instead. This will give you lower level lucene document access see http://thecogworks.co.uk/blog/posts/2013/april/examiness-hints-and-tips-from-the-trenches-part-10-document-writing-redux/ so you could take the code from the worked example and use it in document writing event. Hit me up on skype if you got any questions ismail_mayat
Think I might end up asking for your expert opinion but thought I should pop my progress up here, I do believe I'm half way there.
Thanks to your article on thecogworks I was apple to add a second event handler to my examine extending class to give me lower level access to the index.
So to begin with I defined the variables to set up the indexer. Defining my index and Maximum and Minimum radius my plotter dictionary and the location prefix for the indexed fields.
Next OnApplicationStarted I initiated my projector and built my location tiers
IProjector projector = new SinusoidalProjector();
var ctp = new CartesianTierPlotter(0, projector,LocationTierPrefix);
_startTier = ctp.BestFit(MaxM);
_endTier = ctp.BestFit(MinM);
Plotters = new Dictionary<int, CartesianTierPlotter>();
for (var tier = _startTier; tier <= _endTier; tier++)
{
Plotters.Add(tier, new CartesianTierPlotter(tier, projector, LocationTierPrefix));
}
var indexer = (UmbracoContentIndexer)ExamineManager.Instance.IndexProviderCollection[JobIndex];
indexer.DocumentW
Finally the event itself is fairly simple. Defing containers for our latitude and longitude values before testing for there pressense in the document being indexed. If they are pressent we work though our defined tiers using these lat long values. Encoding the lat and long values before adding tiers as we go.
string _geolat;
string _geolong;
if (e.Fields.TryGetValue("geolat", out _geolat) && e.Fields.TryGetValue("geolong", out _geolong))
{
e.Document.Add(new Field("codedlat", NumericUtils.DoubleToPrefixCoded(Convert.ToDouble(_geolat)), Field.Store.YES, Field.Index.NOT_ANALYZED));
e.Document.Add(new Field("codedlang", NumericUtils.DoubleToPrefixCoded(Convert.ToDouble(_geolong)), Field.Store.YES, Field.Index.NOT_ANALYZED));
for (var tier = _startTier; tier <= _endTier; tier++)
{
var ctp = Plotters[tier];
var boxId = ctp.GetTierBoxId(Convert.ToDouble(_geolat), Convert.ToDouble(_geolong));
e.Document.Add(new Field(ctp.GetTierFieldName(),
NumericUtils.DoubleToPrefixCoded(boxId),
Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS));
}
}
This reasults in what I woudl expect to see in the index
Top ranking terms
?
Rank
Field
Text
1
2
LocationTierPrefix_11
?ntC
2
2
LocationTierPrefix_12
?g?tC
3
2
LocationTierPrefix_13
?_?tC
4
2
LocationTierPrefix_14
?WO~mB(
5
2
LocationTierPrefix_2
@Lf3Le
6
2
LocationTierPrefix_10
?urr#Q'
7
2
LocationTierPrefix_4
@ua#kBGV
8
2
LocationTierPrefix_5
@ua#kBGV
9
2
LocationTierPrefix_6
@ua#kBGV
10
2
LocationTierPrefix_7
@~|vd-
11
2
LocationTierPrefix_8
?>;2CJ
12
2
LocationTierPrefix_9
?{~|vd-
13
2
LocationTierPrefix_3
@Lf3Le
14
1
codedlat
@%a533
15
1
codedlang
@=$2i.c
now comes the tricky part of getting a BooleanQuery(); into my examine search <.<
So far my first attempt at creating a searcher based on the examples I've looked at while not throwing a error isn't returning results (not a great sign)
My first stumbling block was up until this point I'd used a stringbuilder that finally passed to criteria = criteria.RawQuery(stringbuilder) in order to search the index.
Following Gorillas example I built a
var masterQuery = new BooleanQuery();
From there I grabbed the coordinates from the query string and created a distance filter with a giant radius (just to try and capture results)
if (!string.IsNullOrEmpty(geoTerm))
{
string[] coordinates = geoTerm.Split(',');
double _lat = Convert.ToDouble(coordinates[0]);
double _long = Convert.ToDouble(coordinates[1]);
double _radius = 4000;
/* Builder allows us to build a polygon which we will use to limit
* search scope on our cartesian tiers, this is like putting a grid
* over a map */
var builder = new CartesianPolyFilterBuilder(LocationTierPrefix);
/* Bounding area draws the polygon, this can be thought of as working
* out which squares of the grid over a map to search */
var boundingArea = builder.GetBoundingArea(_lat, _long, _radius);
/* We refine, this is the equivalent of drawing a circle on the map,
* within our grid squares, ignoring the parts the squares we are
* searching that aren't within the circle - ignoring extraneous corners
* and such */
var distFilter = new LatLongDistanceFilter(boundingArea,
_radius,
_lat,
_long,
"codedlat",
"codedlong");
/* Add our filter, this will stream through our results and determine eligibility */
masterQuery.Add(new ConstantScoreQuery(distFilter), BooleanClause.Occur.MUST);
}
Passing the masterQuery into SearchProviderCollection Search meant coverting that BooleanQuery into a ISearchCriteria by passing the RawQuery toString
But so far no luck on getting results out the other end. There is quite allot going on in the middle here and I'm having trouble spotting what I've missed. I'd be more comftable if I wasn't dicing inbetween BooleanQuery and ISearchCriteria.
More progress although I'm not sure it's quite finished. Looking through a few Cogworks posts I spotted a example of bypassing examine all together in favour of using pure Lucece search. This looks as if it's going to be the better solution as there will be no need to pass or convert the constant score qurery.
string indexRootPath = "~/App_Data/TEMP/ExamineIndexes/";
// Define Inxed Name
string indexName = "Test";
string indexPath = indexRootPath + indexName + "/Index";
var indexDirectory = FSDirectory.Open(new DirectoryInfo(HttpContext.Current.Server.MapPath(indexPath)));
Lucene.Net.Search.IndexSearcher searcher = new
Lucene.Net.Search.IndexSearcher(indexDirectory);
// Get the searcher from examine
Lucene.Net.Search.TopDocs results = searcher.Search(mainQuery, null, searcher.MaxDoc());
}
<div>
@foreach (ScoreDoc scoreDoc in results.ScoreDocs)
{
Document doc = searcher.Doc(scoreDoc.doc);
string myFieldVale = doc.Get("nodeName");
<p>@myFieldVale</p>
}
</div>
So essentially I created a few short strings to help define the index location, mapped the location and created a new Lucene index search based on that path.
Running the earlier posted radius search though this search does indeed post good reasults, which is a significant step up from the null returned with examine.
My remaining issues stem from erratic results when testing the distance of my search. At large distances results seem to be returned resonable accurately but at distances of sub 10 miles some results appear to be missing.
One day I am certain to look back at this dyslexic mistake and laugh...
Everything is now working as intended with one tiny tweak. In my earlier example document index writter I made a mistake on adding the Prefix Coded value to my index. Rather than adding codedLong to the index I accidentally added codedLang. This mean that when running my search I always returned a latitude of 0 which rather squewed results.
With all of this resolved I now a working spatial search using lucene and a custom document index writer.
Would have been greate to do this through examine but it's just to restricive.
I've been working on a test that integrates the leaping gorillas example also, but I'm not getting any results. Any chance of you posting some code, or making it available for download somewhere?
OK - I got this working with Lucene 2.9.4.x, the one bundled with the latest Umbraco build (7.1.3). Hit me up if u need some code... I'll work on a blog post too, specifically for Umbraco...
Did you ever get around to writing that blog post? I'm having trouble integrating the Leaping Gorilla post with Examine as well. Would love to see how you solved it.
Hey Brendan - specifically which bits you having problems with? Do you get the Lucene index created with the relevant data (latCoded, lonCoded, LocationTierPrefix_1 - 15 etc)? Is it trouble pulling results from the index?
The crux of the search is here. I grab the lat/lon of the user (using JavaScript location services) and search like this:
if (double.TryParse(Request.QueryString["lat"], out _lat) && double.TryParse(Request.QueryString["lon"], out _long))
{
string indexPath = "~/App_Data/TEMP/ExamineIndexes/Venue/Index";
var indexDirectory = FSDirectory.Open(new DirectoryInfo(HttpContext.Current.Server.MapPath(indexPath)));
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(indexDirectory);
double KmsToMiles = 0.621371192;
double _radius = 300 * KmsToMiles;
/* Builder allows us to build a polygon which we will use to limit
* search scope on our cartesian tiers, this is like putting a grid
* over a map */
var builder = new CartesianPolyFilterBuilder("LocationTierPrefix_");
/* Bounding area draws the polygon, this can be thought of as working
* out which squares of the grid over a map to search */
var boundingArea = builder.GetBoundingArea(_lat, _long, _radius);
/* We refine, this is the equivalent of drawing a circle on the map,
* within our grid squares, ignoring the parts the squares we are
* searching that aren't within the circle - ignoring extraneous corners
* and such */
var distFilter = new LatLongDistanceFilter(boundingArea,
_radius,
_lat,
_long,
"latCoded",
"lonCoded");
var masterQuery = new BooleanQuery();
/* Add our filter, this will stream through our results and determine eligibility */
masterQuery.Add(new ConstantScoreQuery(distFilter), BooleanClause.Occur.MUST);
// Get the searcher from examine
Lucene.Net.Search.TopDocs results = searcher.Search(masterQuery, null, searcher.MaxDoc());
sortedResults = results.ScoreDocs.Select(sd => new Helper.LocationSearchResult(sd.score, int.Parse(searcher.Doc(sd.doc).GetField("id").StringValue()), distFilter.GetDistance(sd.doc) / KmsToMiles)).OrderBy(x => x.DistanceInKms).Take(30).ToList();
}
It's only a dozen lines of code, so it's not too bad. I tried everything to get it working through examine, but no love. I believe when the updated Lucene to 3.x, it will be easier, so pls vote for this ticket as well :)
Lucene with spatial.net
Thanks to Ismail's answer on StackOverflow regarding searching Lucene with spatial queries (Geosearching).
However I'm brand new to Lucene, so I've ordered Lucene in Action in order to learn more about the underlying technology and how it works.
I've rigged up a simple search, using the walkthrough on Umbraco.TV with the hopes of extending this to integrate Spatial.net - however on the surface the way it's performing this search is completely different to the way the Spatial.net addition for Luene.net works.
Does anyone have some hands on experience on creating a datatype in Umbraco that stores spatial data which allows it to be displayed in the front/backend, whilst allowing Lucene to search it using the Spatial.net package?
If not it looks like I'm going to have my hands dirty with some heavy reading! :)
Thanks
Pete
Hi Pete,
I'm no expert in spatial data, (I haven't looked at Spatial.net yet either) - but I'm curious how it could work with Umbraco - defintely worth exploring.
From the data-types perspective, do you know if it's possible to serialize spatial data? e.g. "LINESTRING" & "POLYGON" co-ords? (PostGIS, GML2/GML3 - I googled these)
I googled around about this and found the SqlGeometryBuilder and SqlGeometry objects. If that didn't cause much overhead, then would be possible to store serialized data as Ntext, with the data-type's render control to deseralize it.
Just throwing ideas around...
Cheers, Lee.
Thanks for your thoughts Lee
In the end I decided to store the data in a dedicated table in SQL Server 2008, configured to store spatial data. This way I can easily use the powerful spatial search features built in.
I've also rigged up an event handlers so that whenever a document is added/saved/deleted it'll automatically extract the spatial data and pass into the spatial table in SQL Server.
Pete
Guys,
Found this test class in spatial.net which covers how to index and search using spatial https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_1/contrib/Spatial.Net/Tests/TestCartesian.cs. ; So you could use the googlemaps datatype to retrieve the geo codes for you and store those in the document in umbraco. Then on publish tap into gatheringnode event of examine and then index according to the code in the test class that will give you data in the index so that you can filter and sort on it.
Regards
Ismail
Hi Guys,
I don't know if anyone is still looking at this but I'd rather not post a new forum thread if this lies unsolved.
I'm looking to implement spatial searching using lucence to plug in to a existing Examine based search routine. I've already built a gather event to index node realtionship data into our index and we already have Google providing lat lang data that is stored against our nodes.
The reading I've done around this however has really had me scratching my head. The best walkthough example of the process I've found is here http://www.mhaller.de/archives/156-Spatial-search-with-Lucene.html
But taking it from a java context to something we'd write in a gathering event and indeed what the eventual raw luecne query would look like is still proving tricky.
The only other issue I can see is that to take advantage of Lucence.net spatial https://nuget.org/packages/Lucene.Net.Contrib.Spatial is that I'll have to recompile Umbraco Examine with the latest version of Lucene.net
Has anyone else actually tried all this before?
drew,
spatial works with 2.9.1 and examine is 2.9.4 so it should work without having to rebuild examine?
Regards
Ismail
Hi Ismail,
That sounds like a much less troublesome option. Via NuGet the https://nuget.org/packages/Lucene.Net.Contrib.Spatial/3.0.3 seems to call for a dependancy of Lucene.Net > 3.0.3
Is there a older version of Lucene.Net.Contrib.Spatial out there or am I simply looking at the wrong Spatial library?
Thanks =)
See https://svn.apache.org/repos/asf/lucene.net/tags/Lucene.Net_2_9_1/contrib/Spatial.Net/ that is 291 version
Hi Ismail,
Had to down tools on this problem to concentrate on the gathering method for geolocation. But in the intervining time it would see that Gary H over at Leaping Gorilla has given us a tantalising walkthough of putting this into action.
See here http://leapinggorilla.com/Blog/Read/1010/spatial-search-in-lucenenet---worked-example
Along with a pretty descriptive walkthough he also helped with one of the more basic steps I had struggled with, just pulling in all the necessary librarys. For those also looking the following NuGet will solve that little problem.
My remaining head scratch is around how to implement this within a ExamineManager GatheringNodeData event.
It's easy enough to grab the fields containing our Lat and Long and create our Cartisan Plotter. But once the plotter data is created whats the best way to pass it into IndexingNodeDataEventArgs?
Obviously will pop up my code as I progress =)
Drew,
Excellent find on that article. With regards to getting it into the index don't use gatheringnode data you need lower level lucene access use document writing event instead. This will give you lower level lucene document access see http://thecogworks.co.uk/blog/posts/2013/april/examiness-hints-and-tips-from-the-trenches-part-10-document-writing-redux/ so you could take the code from the worked example and use it in document writing event. Hit me up on skype if you got any questions ismail_mayat
Regards
Ismail
Hi Ismail,
Think I might end up asking for your expert opinion but thought I should pop my progress up here, I do believe I'm half way there.
Thanks to your article on thecogworks I was apple to add a second event handler to my examine extending class to give me lower level access to the index.
So to begin with I defined the variables to set up the indexer. Defining my index and Maximum and Minimum radius my plotter dictionary and the location prefix for the indexed fields.
Next OnApplicationStarted I initiated my projector and built my location tiers
Finally the event itself is fairly simple. Defing containers for our latitude and longitude values before testing for there pressense in the document being indexed.
If they are pressent we work though our defined tiers using these lat long values. Encoding the lat and long values before adding tiers as we go.
This reasults in what I woudl expect to see in the index
Top ranking terms
now comes the tricky part of getting a BooleanQuery(); into my examine search <.<
Drew,
What about the standard examine boolean or and operators do they not work?
Regards
Ismail
Hi Ismail,
So far my first attempt at creating a searcher based on the examples I've looked at while not throwing a error isn't returning results (not a great sign)
My first stumbling block was up until this point I'd used a stringbuilder that finally passed to criteria = criteria.RawQuery(stringbuilder) in order to search the index.
Following Gorillas example I built a
From there I grabbed the coordinates from the query string and created a distance filter with a giant radius (just to try and capture results)
Passing the masterQuery into SearchProviderCollection Search meant coverting that BooleanQuery into a ISearchCriteria by passing the RawQuery toString
But so far no luck on getting results out the other end. There is quite allot going on in the middle here and I'm having trouble spotting what I've missed. I'd be more comftable if I wasn't dicing inbetween BooleanQuery and ISearchCriteria.
Morning,
More progress although I'm not sure it's quite finished. Looking through a few Cogworks posts I spotted a example of bypassing examine all together in favour of using pure Lucece search. This looks as if it's going to be the better solution as there will be no need to pass or convert the constant score qurery.
So essentially I created a few short strings to help define the index location, mapped the location and created a new Lucene index search based on that path.
Running the earlier posted radius search though this search does indeed post good reasults, which is a significant step up from the null returned with examine.
My remaining issues stem from erratic results when testing the distance of my search. At large distances results seem to be returned resonable accurately but at distances of sub 10 miles some results appear to be missing.
One day I am certain to look back at this dyslexic mistake and laugh...
Everything is now working as intended with one tiny tweak. In my earlier example document index writter I made a mistake on adding the Prefix Coded value to my index. Rather than adding codedLong to the index I accidentally added codedLang. This mean that when running my search I always returned a latitude of 0 which rather squewed results.
With all of this resolved I now a working spatial search using lucene and a custom document index writer.
Would have been greate to do this through examine but it's just to restricive.
Oh well on to faceted search =)
Hi Drew,
I've been working on a test that integrates the leaping gorillas example also, but I'm not getting any results. Any chance of you posting some code, or making it available for download somewhere?
Hi Drew,
+1 for some sample code :)
OK - I got this working with Lucene 2.9.4.x, the one bundled with the latest Umbraco build (7.1.3). Hit me up if u need some code... I'll work on a blog post too, specifically for Umbraco...
Hey Karl.
Did you ever get around to writing that blog post? I'm having trouble integrating the Leaping Gorilla post with Examine as well. Would love to see how you solved it.
Cheers. Brendan
Hey Brendan - specifically which bits you having problems with? Do you get the Lucene index created with the relevant data (latCoded, lonCoded, LocationTierPrefix_1 - 15 etc)? Is it trouble pulling results from the index?
Let me know, happy to help...
Cheers!
Just querying the data out. I've managed to get the documents indexed just fine, and the lat/long/locationTier fields are all present.
The crux of the search is here. I grab the lat/lon of the user (using JavaScript location services) and search like this:
What is the exact error you are getting?
I wasn't getting an error. I was just hoping to query through Examine directly without having to access the underlying Lucene IndexSearcher.
Thanks for that. I guess I'll just have to bite the bullet and get it done that way.
Cheers. Brendan
It's only a dozen lines of code, so it's not too bad. I tried everything to get it working through examine, but no love. I believe when the updated Lucene to 3.x, it will be easier, so pls vote for this ticket as well :)
is working on a reply...