I have Umbraco site with a custom backend database of several thousand members using a custom membership provider.
These members need to be searchable on all manner of associated data.
The members have many complex properties in a multiple level table hierarchy, many of which are required to be in the search.
A naive search technique of fully instantiating the objects and filtering those properties is unusably slow. I could just store search properties on the master member table on every update, but for scalability and data integrity I'd prefer to use Examine.
This is the first time I've ever used it, so I have some questions!
There is a requirement for a postcode search - i.e. "distance from X". The only way I can think of doing this is to run the Examine search for everyone, getting the latitude and longitudes back, then post filter on calculated differences - is there a better way to do that?
Some of the properties are lists of supported actions stored in tables - i.e. if a member can do A, F and G, and a user searches for members who can do A, D, E - that's a match on A. What's the best way of dealing with this? I can probably come up with hacky ways of smushing the information into strings, but is there a more effective way?
Simple one - as per my code below I am setting up the info in a SimpleDataSet, meaning I have to update ExamineIndex.config as well as this code every time - not a big deal, but just wondered if there was a more dynamic way?
The budget for this is not great, so the shortcuttiest way of doing things is probably fine :)
Any help gratefully received!
My (working, and beautifully fast) basic code so far is:
public class UFCMemberFinderDataService : ISimpleDataService
{
public IEnumerable<SimpleDataSet> GetAllData(string indexType)
{
using (DbConnection conn = ConnectionHelper.Open(Settings.MainConnectionSettings))
{
DbCommand cmd = conn.CreateCommand();
cmd.CommandText = @"select UserIID from UserProfileData where IsPublishable = 1 and IsAuthorized = 1 and IsActive = 1";
using (DbDataReader rs = cmd.ExecuteReader())
{
List<SimpleDataSet> data = new List<SimpleDataSet>();
while (rs.Read())
{
MemberWrapperBase mwrap = MemberWrapperBase.Get(DataHelper.FromDB<int>(rs["UserIID"]));
if (!(mwrap is Administrator))
{
RegisteredUserBase rub = (RegisteredUserBase)mwrap;
data.Add(new SimpleDataSet()
{
NodeDefinition = new IndexedNode()
{
NodeId = rub.UserIID,
Type = "CustomData"
},
RowData = new Dictionary<String, String>()
{
{ "IID", rub.UserIID.ToString() },
{ "Type", rub.GetType().Name },
{ "FirstName", rub.FirstName },
{ "LastName", rub.FirstName },
{ "CompanyName", rub is Company ? ((Company)rub).CompanyName : "" },
{ "PostCode", rub.Address.PostCode },
{ "IsFeatured", rub.IsFeatured ? "1" : "0" }
}
});
}
}
return data;
}
}
}
}
I have now got my search up and running, so in case it is helpful for anyone else, here is how I implememted it:
PostCode search was done as above; the Examine results return easting / northing, then I go through the results adding distance from the searched for postcode. The algorithm for this is very simple so this adds no perceptible delay to the results.
List properties - I added these as pipe separated string fields; so a member could have { "Services", "carpentry|plumbing|decorating" } in the dataset.
Because of the way Examine searches by keywords, this works perfectly and trivially for simple queries. I have been using raw Lucene queries as they are easier to build dynamically, and the query for this is simply Services:plumbing. To search for members who can do carpentry and plumbing you simply add more criteria to the Lucene query - (Services:carpentry AND Services:plumbing).
Haven't found a solution for this, but it's not a big deal to make sure your index definition is up to date.
Really pleased with this; it's take a search that was really too slow to use without users becoming annoyed and giving up and made it fast enough to do AJAX search-as-you-type, and shows that it's actually pretty trivial to hook into Examine and do very high performance searches with data that is completely outside Umbraco.
(Note: The example above is extremely cut down for clarity, in case anyone is wondering why what looks like a simple members table would be so slow to search! For example, IsFeatured alone as a property is actually defined by looking up an actively purchased registration record within a particular date range with a particular product type and other properties, and there are many ones of similar complexity.)
For point one you can do geo spatial searches you will need the contrib plug in see http://www.leapinggorilla.com/Blog/Read/1005/spatial-search-in-lucenenet so as part of your indexing for your members get their post code then geo code it and store it in the index. You can then on your search using post code geo code that then do the spatial search. Here is some old code of mine for db indexer
public IEnumerable<SimpleDataSet> GetAllData(string indexType)
{
return GetData(indexType);
}
private IEnumerable<SimpleDataSet> GetData(string indexType)
{
var config = new Config();
using (var sqlConn = new SqlConnection(config.ConnectionString))
{
sqlConn.Open();
foreach (var key in config.TableQueryLookUp.Keys)
{
var objSqlCmd = new SqlCommand { Connection = sqlConn, CommandText = config.TableQueryLookUp[key] };
using (var reader = objSqlCmd.ExecuteReader())
{
while (reader.Read())
{
int fields = reader.FieldCount;
var sds = new SimpleDataSet { NodeDefinition = new IndexedNode(), RowData = new Dictionary<string, string>() };
for (int i = 0; i < fields; i++)
{
if (i == 0)
{
sds.NodeDefinition.NodeId = Convert.ToInt32(reader[0]);
sds.NodeDefinition.Type = key;
}
else
{
sds.RowData.Add(reader.GetName(i), XmlCharacterWhitelist(reader[i].ToString()));
}
}
yield return sds;
}
}
}
sqlConn.Close();
}
}
Note the yield statement so we can start indexing before all results are returned. With regards to the config i dont think with simple indexer there is a way to do it dynamically you have to add the fields to the xml config.
Examine pointers
I have Umbraco site with a custom backend database of several thousand members using a custom membership provider.
These members need to be searchable on all manner of associated data.
The members have many complex properties in a multiple level table hierarchy, many of which are required to be in the search.
A naive search technique of fully instantiating the objects and filtering those properties is unusably slow. I could just store search properties on the master member table on every update, but for scalability and data integrity I'd prefer to use Examine.
This is the first time I've ever used it, so I have some questions!
There is a requirement for a postcode search - i.e. "distance from X". The only way I can think of doing this is to run the Examine search for everyone, getting the latitude and longitudes back, then post filter on calculated differences - is there a better way to do that?
Some of the properties are lists of supported actions stored in tables - i.e. if a member can do A, F and G, and a user searches for members who can do A, D, E - that's a match on A. What's the best way of dealing with this? I can probably come up with hacky ways of smushing the information into strings, but is there a more effective way?
Simple one - as per my code below I am setting up the info in a SimpleDataSet, meaning I have to update ExamineIndex.config as well as this code every time - not a big deal, but just wondered if there was a more dynamic way?
The budget for this is not great, so the shortcuttiest way of doing things is probably fine :)
Any help gratefully received!
My (working, and beautifully fast) basic code so far is:
Called with:
I have now got my search up and running, so in case it is helpful for anyone else, here is how I implememted it:
PostCode search was done as above; the Examine results return easting / northing, then I go through the results adding distance from the searched for postcode. The algorithm for this is very simple so this adds no perceptible delay to the results.
List properties - I added these as pipe separated string fields; so a member could have
{ "Services", "carpentry|plumbing|decorating" }
in the dataset.Because of the way Examine searches by keywords, this works perfectly and trivially for simple queries. I have been using raw Lucene queries as they are easier to build dynamically, and the query for this is simply
Services:plumbing
. To search for members who can do carpentry and plumbing you simply add more criteria to the Lucene query -(Services:carpentry AND Services:plumbing)
.Haven't found a solution for this, but it's not a big deal to make sure your index definition is up to date.
Really pleased with this; it's take a search that was really too slow to use without users becoming annoyed and giving up and made it fast enough to do AJAX search-as-you-type, and shows that it's actually pretty trivial to hook into Examine and do very high performance searches with data that is completely outside Umbraco.
(Note: The example above is extremely cut down for clarity, in case anyone is wondering why what looks like a simple members table would be so slow to search! For example,
IsFeatured
alone as a property is actually defined by looking up an actively purchased registration record within a particular date range with a particular product type and other properties, and there are many ones of similar complexity.)Rob,
For point one you can do geo spatial searches you will need the contrib plug in see http://www.leapinggorilla.com/Blog/Read/1005/spatial-search-in-lucenenet so as part of your indexing for your members get their post code then geo code it and store it in the index. You can then on your search using post code geo code that then do the spatial search. Here is some old code of mine for db indexer
Note the yield statement so we can start indexing before all results are returned. With regards to the config i dont think with simple indexer there is a way to do it dynamically you have to add the fields to the xml config.
Regards
Ismail
Ah, I will look at converting mine to yield, very good point, thanks!
The spatial stuff is a bit overkill for this project but thanks for link, looks very interesting.
Hi Rob,
Did you use these fields in the Examine.config file ?
is working on a reply...