We are doing a semi large website where we need to automatically list pages related to the current page.
The sollution we are currently working on is similar to what is used in most blogs, where every page is tagged with keywords. Then the idea was to do a search for every page containing the same (or some of the same) tags. Maybe even sort the results on the count of matching tags.
My question is how to do this without pushing the server to the limit at every page view? The 'Tags' datatype is already implemented in Umbraco by default, so I guess someone already have given it some thoughts ;-)
I would use the same approuch. I would use a seperate Macro to list the related posts uisng xpath or a search and cache that so it won't hit the server resources any time you visit the page.
Unfortunately the actual Tags implementation misses the count of the tags. I made a blog cumulus package where I extended the ITag interface of umbraco to get all the related sites. Perhaps this is a starting point for you. See here
Also I added a tag related search via xslt:
<xsl:template match="/"> <xsl:variable name ="page" select="umbraco.library:RequestQueryString('page')" /> <xsl:variable name ="tag" select="umbraco.library:RequestQueryString('tag')" />
The key with any of these options is to set the cache time on the macro to something quite large. I'd go for at least 10 minutes and maybe an hour (or even once a day). It depends on how quickly you need to see any change to the related pages output.
If you need live or nearly-live updating and can't use a large cache time, or the macro is simply too slow on the rare occassions it needs to run, look into using the lucene search to pull back the results. Probably a bit of customization needed but the performance will be exceptionally fast since lucene is index-based. Alternatively, issue a lucene search (and possibly cache its output)
First of all, I had forgotten about the cache feature of umbraco. I'm still pretty new to both umbraco and .net, but slowly getting there. My guess is that it will do the trick perfectly, just updating twice a day.
Secondly, I also had the idea that we could have an sql table that listed the related pages to every page. This table would then get updated every time a new page was created or changed. This way, the processing would be made only once, instead of every time the page was viewed. Any comments on this sollution? Anyway I will see if the cache will do the trick first, which I think it may.
Thomas: Number of nodes: Not sure actually. Potentially 500 a year. Not sure if that is a "semi large website" though. I guess it's relative.
I wouldn't go directly for a custom solution that is using a DB. What happens if the event for some reason isn't triggered. Or someone did an action that wasn't handled through events? Then you would have a corrupt index. Querying using xpath is pretty fast and with a good caching strategy it should perform well.
Also what you can do is to write a custom datatype which stores the correspondent Ids at publishing in the node, so you have all related nodes directly present on each node (no need to search anymore).
How to realize: Using ActionHandlers to add the ids which have the same tags. Take a look into the cmsTags and cmsTagRelationship tables and also in the umbraco.editorcontrols.tags.library class
Find related pages on matching tags.
Hello Fellow Umbracos & Umbracas
We are doing a semi large website where we need to automatically list pages related to the current page.
The sollution we are currently working on is similar to what is used in most blogs, where every page is tagged with keywords. Then the idea was to do a search for every page containing the same (or some of the same) tags. Maybe even sort the results on the count of matching tags.
My question is how to do this without pushing the server to the limit at every page view? The 'Tags' datatype is already implemented in Umbraco by default, so I guess someone already have given it some thoughts ;-)
Any good suggestions? Maybe a better approach?
/Dan
Hi Dan,
I would use the same approuch. I would use a seperate Macro to list the related posts uisng xpath or a search and cache that so it won't hit the server resources any time you visit the page.
Cheers,
Richard
Unfortunately the actual Tags implementation misses the count of the tags. I made a blog cumulus package where I extended the ITag interface of umbraco to get all the related sites. Perhaps this is a starting point for you. See here
Also I added a tag related search via xslt:
hth, Thomas
The key with any of these options is to set the cache time on the macro to something quite large. I'd go for at least 10 minutes and maybe an hour (or even once a day). It depends on how quickly you need to see any change to the related pages output.
If you need live or nearly-live updating and can't use a large cache time, or the macro is simply too slow on the rare occassions it needs to run, look into using the lucene search to pull back the results. Probably a bit of customization needed but the performance will be exceptionally fast since lucene is index-based. Alternatively, issue a lucene search (and possibly cache its output)
cheers,
doug.
How many nodes will this semi large site have?
Thomas
First of all, I had forgotten about the cache feature of umbraco. I'm still pretty new to both umbraco and .net, but slowly getting there. My guess is that it will do the trick perfectly, just updating twice a day.
Secondly, I also had the idea that we could have an sql table that listed the related pages to every page. This table would then get updated every time a new page was created or changed. This way, the processing would be made only once, instead of every time the page was viewed. Any comments on this sollution? Anyway I will see if the cache will do the trick first, which I think it may.
Thomas: Number of nodes: Not sure actually. Potentially 500 a year. Not sure if that is a "semi large website" though. I guess it's relative.
/Dan
Hi Dan,
I wouldn't go directly for a custom solution that is using a DB. What happens if the event for some reason isn't triggered. Or someone did an action that wasn't handled through events? Then you would have a corrupt index. Querying using xpath is pretty fast and with a good caching strategy it should perform well.
CHeers,
Richard
Also what you can do is to write a custom datatype which stores the correspondent Ids at publishing in the node, so you have all related nodes directly present on each node (no need to search anymore).
How to realize: Using ActionHandlers to add the ids which have the same tags. Take a look into the cmsTags and cmsTagRelationship tables and also in the umbraco.editorcontrols.tags.library class
If you need help just contact me.
Thomas
Hi Dan,
Did you ever get this thing to work, as i am working on the same thing.
best regards,
Brian
is working on a reply...