Indexing Tag Data with Examine for Multi-site/language install
I have a multi-site install and each site requres the ability to render it's own tags clouds either globally or per document type e.g. all tags across and entire site or only those assigned to News Articles only. In approaching this I've encountered a few problems and so am looking to the community for feedback/direction on the best approach.
I currently have a GatheringNodeData event as follows which is intended to massage the contents of the search index into something I can use:
private void SearchIndexerGatheringNodeData(object sender, IndexingNodeDataEventArgs e)
{
if (e.IndexType == IndexTypes.Content)
{
try
{
// Split the path to the current node and extract the homepage id
string rootNodeId = e.Fields["path"].Split(',')[2];
e.Fields.Add("siteId", rootNodeId);
// Remove the comma delimeters on the tags field so that they
// can all be indexed individually
if (!string.IsNullOrEmpty(e.Fields["tags"]))
{
e.Fields["tags"] = e.Fields["tags"].Replace(",", " ");
}
}
catch (Exception ex)
{
Log.Add(LogTypes.Error, -1, string.Concat("Failed to update search index node data: ", ex));
}
}
}
..however this has presented a couple of problems, one of which is that some tags are more than one word and so the words end up getting indexed individually rather than collectively e.g. "My Tag" is actually indexed as "My" and "Tag". What is the best way around this as I'm guessing I can't index quoted values?
I actually have more related issues but I think it's better to post these seperately for clarity.
On a side note even if that works having the words seperated causes incorrect tag weightings in the tag cloud as the following tags would result in a higher weighting on the term "Tag" when in fact it only occurs once.
I think phrase search is the default. Not to long ago we had a client saying that when they did search for Art dealers they would get no results even though they had content with words art and dealer. So what i had todo was in the query string check if we had space if so then split out and make or query so instead of "Art dealer" it was doing art or dealer. With regards to other issue only thing i can think of is create new field and insert tags with space replaced by _ that way each will be unique. Not sure what you are using to render the tags but in that script/usercontrol etc when you write out the tag replace the _ with a space.
Thanks Ismail, I was thinking of doing something like that or even adding in my own delimeter that I could strip back out when I render the tag clouds.
Indexing Tag Data with Examine for Multi-site/language install
I have a multi-site install and each site requres the ability to render it's own tags clouds either globally or per document type e.g. all tags across and entire site or only those assigned to News Articles only. In approaching this I've encountered a few problems and so am looking to the community for feedback/direction on the best approach.
I currently have a GatheringNodeData event as follows which is intended to massage the contents of the search index into something I can use:
..however this has presented a couple of problems, one of which is that some tags are more than one word and so the words end up getting indexed individually rather than collectively e.g. "My Tag" is actually indexed as "My" and "Tag". What is the best way around this as I'm guessing I can't index quoted values?
I actually have more related issues but I think it's better to post these seperately for clarity.
Thanks, Simon
Simon,
If the search for my tag is done as phrase search then my tag will be exact match.
Regards
Ismail
Hi Ismail,
Out of interest how would I do a "phrase search"?
On a side note even if that works having the words seperated causes incorrect tag weightings in the tag cloud as the following tags would result in a higher weighting on the term "Tag" when in fact it only occurs once.
Tag, My Tag,
Thanks, Simon
I think phrase search is the default. Not to long ago we had a client saying that when they did search for Art dealers they would get no results even though they had content with words art and dealer. So what i had todo was in the query string check if we had space if so then split out and make or query so instead of "Art dealer" it was doing art or dealer. With regards to other issue only thing i can think of is create new field and insert tags with space replaced by _ that way each will be unique. Not sure what you are using to render the tags but in that script/usercontrol etc when you write out the tag replace the _ with a space.
Regards
Ismail
Thanks Ismail, I was thinking of doing something like that or even adding in my own delimeter that I could strip back out when I render the tag clouds.
Cheers, Simon
is working on a reply...