I have setup Examine, and it is running as expected.
Now I would like to go one step further and make it into faceted search. Faceted search is the ability to filter on multiple facets and is very well known from many ecommerce sites. One example could be when searching for a camera, you search both Manufacturer, Price and Resolution. Typically you will see a count of the expected hits. Eg. if selecting Canon I can see how many Canon cameras are available before doing my actual search. Most times the facets and hit-count will update when a user searches. Eg. if I select Canon as manufacturer, I will see only Resolutions which will actually give me a search result (eg. see http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr)
I have figured out how to combine my different facets, allowing the user to search for both Manufacturer, Price and Resolution. What I haven't figured out is to do the hitcount and leave out the facets which will not return anything.
I have found this article on how to do it with Lucene.net and BitArrays:
But I cannot figure out how to create a BitArray from my search results using Examine for Umbraco.Or is there a better way of doing faceted search with Examine? Any help is much appreciated!
I first created my query using Lucene.Net (http://incubator.apache.org/lucene.net/). It might not be necessary to build the query in Lucene.Net, but I needed it in Lucene.Net to get the hit count (see step 3 below). If there is a way to get the Lucene.Net query from Examine, I guess you could just create your query in Examine and then convert it to Lucene.Net to get the hit count.
Step 2
I then transformed that to Examine (mainQuery is my Lucene.Net query) and did the search.
///iterate through all the terms you want a hit count for
for (int i = terms.Count - 1; i >= 0; i--)
{
string term = terms[i];
var termQuery = newTermQuery(newTerm(term));
int termCount = facetHitCounter.getFacetHitCount(termQuery); //this will give you the hitcount for the term. You can add it to a collection (eg. a Dictionary) to hold the term and the termCount.
}
}
Step 3.3
I then changed step 2 to the following to add the faceted hit count:
Creating faceted search with Examine
Hi
I have setup Examine, and it is running as expected.
Now I would like to go one step further and make it into faceted search. Faceted search is the ability to filter on multiple facets and is very well known from many ecommerce sites. One example could be when searching for a camera, you search both Manufacturer, Price and Resolution. Typically you will see a count of the expected hits. Eg. if selecting Canon I can see how many Canon cameras are available before doing my actual search. Most times the facets and hit-count will update when a user searches. Eg. if I select Canon as manufacturer, I will see only Resolutions which will actually give me a search result (eg. see http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr)
I have figured out how to combine my different facets, allowing the user to search for both Manufacturer, Price and Resolution. What I haven't figured out is to do the hitcount and leave out the facets which will not return anything.
I have found this article on how to do it with Lucene.net and BitArrays:
http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
But I cannot figure out how to create a BitArray from my search results using Examine for Umbraco.Or is there a better way of doing faceted search with Examine? Any help is much appreciated!
thanks
Thomas
Hey Thomas,
Did you get anywhere with this? I'm interested in doing similar but am unsire if Examine is comprehensive enough to do it
Thanks for any help
/j
Hi Jay,
yes, I got it working.
Step 1
I first created my query using Lucene.Net (http://incubator.apache.org/lucene.net/). It might not be necessary to build the query in Lucene.Net, but I needed it in Lucene.Net to get the hit count (see step 3 below). If there is a way to get the Lucene.Net query from Examine, I guess you could just create your query in Examine and then convert it to Lucene.Net to get the hit count.
Step 2
I then transformed that to Examine (mainQuery is my Lucene.Net query) and did the search.
string searchIndex=NAME_OF_INDEX
Examine.SearchCriteria.ISearchCriteria criteria = ExamineManager.Instance.CreateSearchCriteria(BooleanOperation.And);
criteria = criteria.RawQuery(mainQuery.ToString());
ISearchResults results=ExamineManager.Instance.SearchProviderCollection[searchIndex].Search(criteria);
Step 3
I then built the faceted search .
Step 3.1
First I implemented the following class:
publicclassBitSetFacetHitCounter
{
//http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
private Query baseQuery;
private System.Collections.Generic.Dictionary<string, Query> subQueries;
private IndexSearcher searcher;
private IndexReader reader;
private QueryFilter baseQueryFilter;
private BitArray baseBitArray;
public BitSetFacetHitCounter()
{
}
public void setBaseQuery(Query baseQuery)
{
this.baseQuery = baseQuery;
baseQueryFilter = new QueryFilter(baseQuery);
if (reader != null)
baseBitArray = baseQueryFilter.Bits(reader);
}
public void setSubQueries(System.Collections.Generic.Dictionary<string, Query> subQueries)
{
this.subQueries = subQueries;
}
public void setSearcher(IndexSearcher searcher)
{
this.searcher = searcher;
reader = searcher.GetIndexReader();
}
public System.Collections.Generic.Dictionary<string, Int32> getFacetHitCounts()
{
System.Collections.Generic.Dictionary<string, Int32> facetCounts = new Dictionary<string, int>(subQueries.Keys.Count);
foreach (string key in subQueries.Keys)
{
QueryFilter filter = new QueryFilter(subQueries[key]);
BitArray filterBitArray = filter.Bits(reader);
facetCounts.Add(key, getFacetHitCount(baseBitArray, filterBitArray));
}
return facetCounts;
}
public int getFacetHitCount(Query query)
{
QueryFilter filter = new QueryFilter(query);
BitArray filterBitArray = filter.Bits(reader);
return getFacetHitCount(baseBitArray, filterBitArray);
}
private int getFacetHitCount(BitArray baseBitArray, BitArray filterBitArray)
{
filterBitArray.And(baseBitArray);
return GetCardinality(filterBitArray);
}
private int GetCardinality(BitArray bitArray)
{
var _bitsSetArray256 = new byte[] { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8 };
var array = (uint[])bitArray.GetType().GetField("m_array", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance).GetValue(bitArray);
int count = 0;
for (int index = 0; index < array.Length; index++)
count += _bitsSetArray256[array[index] & 0xFF] + _bitsSetArray256[(array[index] >> 8) & 0xFF] + _bitsSetArray256[(array[index] >> 16) & 0xFF] + _bitsSetArray256[(array[index] >> 24) & 0xFF];
return count;
}
}
step 3.2
I then implemented the following method in the same class as contained the code in step 2.
private void FacetedSearch(Lucene.Net.Search.Query baseQuery, string searchIndex)
{
UmbracoExamineSearcher examineSearcher = (UmbracoExamineSearcher)ExamineManager.Instance.SearchProviderCollection[searchIndex];
Lucene.Net.Search.Searcher s = examineSearcher.GetSearcher();
Lucene.Net.Search.IndexSearcher searcher = (Lucene.Net.Search.IndexSearcher)examineSearcher.GetSearcher();
BitSetFacetHitCounter facetHitCounter = newBitSetFacetHitCounter();
facetHitCounter.setSearcher(searcher);
facetHitCounter.setBaseQuery(baseQuery);
///iterate through all the terms you want a hit count for
for (int i = terms.Count - 1; i >= 0; i--)
{
string term = terms[i];
var termQuery = new TermQuery(new Term(term));
int termCount = facetHitCounter.getFacetHitCount(termQuery); //this will give you the hitcount for the term. You can add it to a collection (eg. a Dictionary) to hold the term and the termCount.
}
}
Step 3.3
I then changed step 2 to the following to add the faceted hit count:
string searchIndex=NAME_OF_INDEX
Examine.SearchCriteria.ISearchCriteria criteria = ExamineManager.Instance.CreateSearchCriteria(BooleanOperation.And);
criteria = criteria.RawQuery(mainQuery.ToString());
ISearchResults results=ExamineManager.Instance.SearchProviderCollection[searchIndex].Search(criteria);
FacetedSearch(mainQuery,searchIndex)
Hope this helps. If you have any questions I will be glad to try to answer them.
best regards
Thomas
private int GetCardinality(BitArray bitArray)
You should save the cardinality in a static variable, so you don't have to recalculate each time.
Here is how to get the raw Lucene query (haven't tested it myself):
http://our.umbraco.org/forum/developers/api-questions/32721-Get-raw-Lucene-query-after-using-the-Examine-Fluent-API ;
Hey Thomas,
very useful, thank you so much
/j
Hi Thomas,
Thanks for putting this code out there, I'm just trying to get it up and running to support various sections of our new site.
If you dont mind could you please share the code you use to create 'mainQuery'.
Many thanks,
Chris
Hi Chris,
My MainQuery is created in Lucene like this:
BooleanQuery mainQuery = newBooleanQuery();
foreach of the facets I would like to search, I then did the following:
BooleanQuery someFacetQuery = newBooleanQuery();
TermQuery query = new TermQuery(new Term("[NAME OF FIELD TO SEARCH", term to search for));
someFacetQuery.Add(query, BooleanClause.Occur.SHOULD);
mainQuery.Add(someFacetQuery, BooleanClause.Occur.MUST);
And as written above, you can also acheive the same using Examine's API: our.umbraco.org/.../32721-Get-raw-Lucene-query-after-using-the-Examine-Fluent-API
/Thomas
is working on a reply...