Haven't found an answer to this yet after several hours of hacking and Googling/Binging.
The site I am working on makes heavy use or url aliases and we automatically generate these when something is published using umbraco's built in event handling. We are in the process of upgrading from v4 to v6 and I am getting to grips with the new API. What I need to do is to get the url from the IContent object ideally but this isn't available which is understandable as this does not represent a published item. I can try to get the IPublishedContent equivalent using the UmbracoHelper but this only works if the item is already published, newly created items being published for the first item will be null at this point until the document cached is updated. Also the UmbracoHelper.NiceUrl(int) method just gives me '#' probably for the same reason.
So the question is can I get a 'nice url' from the IContent object in any way? I would think there must be some kind of utility method in umbraco that does this at some point during the core publisheing process although whether this is publicly available I wouldn't know.
I am currently hooking into the legacy umbraco.content.AfterUpdateDocumentCache to get round this but ideally I would like to handle this is the new ContentService.Save event.
You can't get the url until the page is published. After it's published you need to get the url from an IPublishedContent object, via methods such as the UmbracoHelper or UmbracoContext.
I.E.
Umbraco.Web.UmbracoContext u = new Umbraco.Web.UmbracoContext(); string pageUrl = u.ContentCache.GetById(2).Url;
Yes, as stated in my question you can get the URL from the IPublishedContent object but there are some limitations on this. Firstly, if this is a newly created node that is being published for the first time then it appears that the IPublishedContent object isn't available during the the Publishing or Published events. This is a little counter intuitive but it seems you need to wait until the legacy AfterUpdateDocumentCache method before you can get this. Also the URL method on the IPublishedContent object is a computed property at runtime and has a dependency on the UmbracoContent.Current object being available. This has been problematic for us as we have a very large site (>100,000 nodes) and during the app start up we build up some caches of POCO's based on specific nodes and their properties that are persisted to help with performance rather the perform the look up during page requests. Ideally we want to farm this off to a separate thread as these caches can take a little while to build but as one of the node properties we are interested in is the URL we cannot run this anywhere other than the main app thread due to the dependency on UmbracoContent.Current. Essentially it appears that the DefaulUrlRouteProvider is the class that handles URL generation but it's constructor is internal so this can't be created on the fly and it's property referent in the UmbracoContext object is also internal so I can't pass it through as a reference to another thread.
Slight rant >
Overall, I've spent a lot of time on this and not really managed to find a satisfactory solution. I can understand some of the reasoning behind the code design but it does seem a little strange that I am finding it so hard to get a page's URL in a CMS system. Of course I could be a complete idiot and missing something glaringly obvious, is so please tell me.
@Charles, below is some pseudo code that is basically what we have used to access the URL property, as mentioned above this will only work in the main UI thread as the URL property is dependant on UmbracoContent.Current:
public class MyEventHandler : ApplicationEventHandler
{
protected override void ApplicationStarted(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)
{
content.AfterUpdateDocumentCache += ContentOnAfterUpdateDocumentCache;
}
private void ContentOnAfterUpdateDocumentCache(Document sender, DocumentCacheEventArgs documentCacheEventArgs)
{
// IContent
var id = sender.Id;
var doc = ApplicationContext.Current.Services.ContentService.GetById(id);
// IPublishedContent
var umbracoHelper = new UmbracoHelper(UmbracoContext.Current);
var publishedDoc = umbracoHelper.TypedContent(id);
}
}
OK, must say that when I designed how urls are produced the assumption was that we would look for urls of published items, once they are published. And then it becomes quite easy, configureable, whatever. I did not think about somebody looking for urls of possibly non-published items, or of currently-being-published items. So yes, it's going to be painful or not possible.
That being said... there's not reason it cannot be fixed. Under some conditions. Don't have time to go into details right now - later today.
Cheers Stephen, without going into too much detail the two things that we are trying to do here is auto-generate a url alias based on the nodes url during the publish event (would be awesome to get this during the saving/save event so we can update and save whilst suppressing the additional save event) and secondly to build up some caches in a background thread which involves mapping some node properties to a POCO which includes the URL.
Appreciate that this is not commonly required but could image there are other scenarios where developers would want to get at this property a little more easily.
Have ideas on how to solve the "get the url" thing + you may actually have unearthed some issues with urls when previewing. Working on it at the moment.
Now, about the "mapping some node properties to a POCO"... if I understand correctly you want to bind an object to a property of a node, and cache that, right? Thinking... if you were to create a property value converter for that property type, returning the POCO... that would work but using the current XML cache, it would compute the POCO at least once per request. So you want it cached for... how long?
In any case I think it could be possible to simplify things by using a property value converted, and managing a cache of converted values from within that converter. So you would not need to pre-generate anything... would be generated the first time it's required. But maybe I don't fully understand your scenario?
Right now, getting the URL of a non-published node fails because we create the URLs from IPublishedContent instances, and as long as the node is not fully published we cannot load an IPublishedContent from the cache. BUT internally we have ways to create an IPublishedContent instance from an IContent. So, currently investigating how we could use that possibility to get URLs -- ie, if the IPublishedContent cannot be found in the cache, create one from an IContent. The only drawback is that an IContent comes from the database, so there would be a performance hit.
EDIT: but when trying to get the URL of an IContent... we already have that IContent. And the "not yet in cache" problem should only impact that node, and none of its parents. So the performance hit might actually not be significant.
Second thing is... I think you raise an issue with using the UmbracoContext. Yes we need an UmbracoContext to do a lot of things, and a detached thread would not have a context. That being said... have you tried any of the UmbracoContext.EnsureContext() methods? I think you could do something like
and it would make sure a proper UmbracoContext is registered. Now the issue of course is to create a proper HttpContextBase. It's not that complex actually. Suggesting you have a look at Umbraco.Web.Standalone.StandaloneHttpContext. It's internal but you could use it as an inspiration to create your own HttpContextBase implementation. Then... That would give you a proper UmbracoContext in that detached thread.
Ok, a little detail with regard to the caching side, what we have a news item doc type that is used through out our site (as mentioned we are talking over 100,000 published nodes) that can be referenced from one or more news list pages and there are various rules regarding where and when they should be displayed i.e. the global news list should show links to news items with the most recent publish date (this isn't the umbraco publish date but a node datepicker property), label news should show links to recent news items from its news list plus news items from the labels artists, genre news should show any news links where the news items parent artist has this genre specified etc.
This is too expensive to generate at runtime so at start up we get all news items and build List
Hope this explains a bit more what we are trying to do, building all these news caches takes about a minute or two and we can live with this delay in application start up at the moment as this is only configured to happen only on our editor site and not the public facing servers, in the web farm, which read the relevant caches from the RavenDb only when the news list pages are requested.
Oooh. Nice. So you're building some sort of content indexes, maintained in RavenDB. Have been doing something similar on a website that had a media library with complex rules on which media to include/exclude. So we compute the IDs of the medias upon the first request and cache that. But just the IDs, no content, and cached in-memory.
See my notes on UmbracoContext, that should help running stuff in a detached thread. Still working on the URL part to try to do something as clean as possible.
Great stuff, I think I did start looking at the EnsureContext method and ran into that exact issue with the HttpContext and stopped at that point.
A bit tied up with some other stuff now but will definitely try and revisit that and try our your suggestion later in the week.
We did originally just store Ids in the cache and generate and process the nodes at runtime but this hasn't scaled too well so no we're trying to do all the heavy lifting up front. However talking about this has made me think that maybe we don't need to store the url in advance and maybe this could just be looked up at runtime without taking too much of a hit.
Thanks for all your help, this has been really useful and I look forward to seeing what you come up with regarding get the Urls form an IContent, this would be massively useful for us.
I'm trying to get a url in a console app.. because there's no httpcontext I can't initialise the umbraco context. I was just wondering if you had a work around for getting a url from IContent in that kind of scenario?
(a) To get the url of a content (IContent) you need to map it to an IPublishedContent and use the content cache. The url generation belongs to IPublishedContent. So you need a content cache in your console app. There's no way around it at the moment[1]. It's not impossible. In fact there's a Umbraco.Web.Standalone namespace that contains stuff to be used in order to create a console app that loads a content cache. And I have a prototype app that lists content + urls. BUT this is completely unstable hence internal.
[1] And... I'm not sure enabling url generation on IContent is a good idea. IContent relies on the DB and it's obviously slower than IPublishedContent. Working on top of the content cache makes much more sense IMHO.
So for console apps... can be done, but it's a matter of stabilizing the Standalone stuff. And... in fact at the moment anytime we add features to Core they sort of break Standalone by assuming that we run in a web app. Not going to be ready soon but it's something I'm working on, and if anybody wants to help...
(b) To get the url of a content (IContent) that is not yet in the content cache (eg when handling the Published event)... that's what triggers this dicussion... it's more complex. What if the parent's content is not in the cache either? What if nothing's in the cache yet? Easiest solution is to tell people to wait for the AfterUpdateDocumentCache event and only compute urls at that time. Not sure it's OK though.
Alternatively there's a "no cache" that's a work-in-progress. It's a content cache that bypasses the XML cache and directly hits the DB. If we could temporarily replace the current (XML) cache and use the "no cache", then we could compute urls before the XML cache is ready. BUT that would be quite slow.
OK, this turned into some sort of brain-dump... to make it short: the whole url generation process is intended to work on top of a content cache, for perfs purposes. Yet Andy's remark about "it should be simple to get the url of a content" is perfectly valid, and I find it a bit strange that when the Published event triggers, you still cannot get the matching IPublishedContent. Need more time to check how it all works and try to find a sensible answer.
Haven't looked at this in a couple of weeks as have been tied up with other stuff but now have a little time to spend on this again. Interesting reading your braindump Stephen, agree that a DB URL generation process will be slower and will have to be used with this in mind. Unfortunately I don't seen any other way of achieving this.
I have been using the legacy AfterUpdateDocumentCache event as a way to ensure that the xml for the node is in the cache and I can create the IPublishedContent to access the URL property in order to generate the URL alias but there a couple of issues here.
Firstly once I have generated and saved the url alias the process starts all over so I need to ensure when the AfterUpdateDocumentCache event gets called again this doesn't repeat the process ad infinitum. This isn't too difficult but (I think in terms of logical process and code, feel free to disagree) it would be preferable to do this in the Content.Saving event so that is can be processed and saved whilst suppressing future events.
Secondly, we have a requirement to support publishing through the timer, which runs in a background context which has no access to HttpContext of UmbracoContext so this just won't work at all here. As far as I can tell catching events thrown by the publishing timer currently means there is absolutely no way to get the URL, whatsoever, end of story, goodbye.
Would love to hear if you have any more thoughts on this or any progress has been made on accessing the URL with out so much dependency on UmbracoContext or an updated cache.
It looks like I've been mistaken in my eagerness and exuberance.
I was doing this in the Published event which I should not have been.
Changing to Publishing event means the GetUrl method no longer works.
It would appear that previously I was unable to get the item as IPublishedContent but I WAS able to use GetUrl because although the item was not in the cache, it WAS published.
Back to the drawing board for me, but a good lesson learned :)
Creating websites for clients where they need to know the URL before publishing of said pages (media with appropriate URL's need to go out before hand)
Would be an option to use it this way, instead of putting down URL aliases.
Getting the url for an IContent object
Haven't found an answer to this yet after several hours of hacking and Googling/Binging.
The site I am working on makes heavy use or url aliases and we automatically generate these when something is published using umbraco's built in event handling. We are in the process of upgrading from v4 to v6 and I am getting to grips with the new API. What I need to do is to get the url from the IContent object ideally but this isn't available which is understandable as this does not represent a published item. I can try to get the IPublishedContent equivalent using the UmbracoHelper but this only works if the item is already published, newly created items being published for the first item will be null at this point until the document cached is updated. Also the UmbracoHelper.NiceUrl(int) method just gives me '#' probably for the same reason.
So the question is can I get a 'nice url' from the IContent object in any way? I would think there must be some kind of utility method in umbraco that does this at some point during the core publisheing process although whether this is publicly available I wouldn't know.
I am currently hooking into the legacy umbraco.content.AfterUpdateDocumentCache to get round this but ideally I would like to handle this is the new ContentService.Save event.
Hi i have had the same problem. Can you post the code you are using and the version of umbraco you are using?
Hi Andy,
You can't get the url until the page is published. After it's published you need to get the url from an IPublishedContent object, via methods such as the UmbracoHelper or UmbracoContext.
I.E.
Umbraco.Web.UmbracoContext u = new Umbraco.Web.UmbracoContext();
string pageUrl = u.ContentCache.GetById(2).Url;
Hi Summit,
Yes, as stated in my question you can get the URL from the IPublishedContent object but there are some limitations on this. Firstly, if this is a newly created node that is being published for the first time then it appears that the IPublishedContent object isn't available during the the Publishing or Published events. This is a little counter intuitive but it seems you need to wait until the legacy AfterUpdateDocumentCache method before you can get this. Also the URL method on the IPublishedContent object is a computed property at runtime and has a dependency on the UmbracoContent.Current object being available. This has been problematic for us as we have a very large site (>100,000 nodes) and during the app start up we build up some caches of POCO's based on specific nodes and their properties that are persisted to help with performance rather the perform the look up during page requests. Ideally we want to farm this off to a separate thread as these caches can take a little while to build but as one of the node properties we are interested in is the URL we cannot run this anywhere other than the main app thread due to the dependency on UmbracoContent.Current. Essentially it appears that the DefaulUrlRouteProvider is the class that handles URL generation but it's constructor is internal so this can't be created on the fly and it's property referent in the UmbracoContext object is also internal so I can't pass it through as a reference to another thread.
Slight rant > Overall, I've spent a lot of time on this and not really managed to find a satisfactory solution. I can understand some of the reasoning behind the code design but it does seem a little strange that I am finding it so hard to get a page's URL in a CMS system. Of course I could be a complete idiot and missing something glaringly obvious, is so please tell me.
@Charles, below is some pseudo code that is basically what we have used to access the URL property, as mentioned above this will only work in the main UI thread as the URL property is dependant on UmbracoContent.Current:
Hi Charles, forgot to mention that this is based on v6.1.6.
OK, must say that when I designed how urls are produced the assumption was that we would look for urls of published items, once they are published. And then it becomes quite easy, configureable, whatever. I did not think about somebody looking for urls of possibly non-published items, or of currently-being-published items. So yes, it's going to be painful or not possible.
That being said... there's not reason it cannot be fixed. Under some conditions. Don't have time to go into details right now - later today.
Cheers Stephen, without going into too much detail the two things that we are trying to do here is auto-generate a url alias based on the nodes url during the publish event (would be awesome to get this during the saving/save event so we can update and save whilst suppressing the additional save event) and secondly to build up some caches in a background thread which involves mapping some node properties to a POCO which includes the URL. Appreciate that this is not commonly required but could image there are other scenarios where developers would want to get at this property a little more easily.
Have ideas on how to solve the "get the url" thing + you may actually have unearthed some issues with urls when previewing. Working on it at the moment.
Now, about the "mapping some node properties to a POCO"... if I understand correctly you want to bind an object to a property of a node, and cache that, right? Thinking... if you were to create a property value converter for that property type, returning the POCO... that would work but using the current XML cache, it would compute the POCO at least once per request. So you want it cached for... how long?
In any case I think it could be possible to simplify things by using a property value converted, and managing a cache of converted values from within that converter. So you would not need to pre-generate anything... would be generated the first time it's required. But maybe I don't fully understand your scenario?
Very interesting anyway.
Stay tuned for the URL thing.
Right now, getting the URL of a non-published node fails because we create the URLs from IPublishedContent instances, and as long as the node is not fully published we cannot load an IPublishedContent from the cache. BUT internally we have ways to create an IPublishedContent instance from an IContent. So, currently investigating how we could use that possibility to get URLs -- ie, if the IPublishedContent cannot be found in the cache, create one from an IContent. The only drawback is that an IContent comes from the database, so there would be a performance hit.
EDIT: but when trying to get the URL of an IContent... we already have that IContent. And the "not yet in cache" problem should only impact that node, and none of its parents. So the performance hit might actually not be significant.
Second thing is... I think you raise an issue with using the UmbracoContext. Yes we need an UmbracoContext to do a lot of things, and a detached thread would not have a context. That being said... have you tried any of the UmbracoContext.EnsureContext() methods? I think you could do something like
and it would make sure a proper UmbracoContext is registered. Now the issue of course is to create a proper HttpContextBase. It's not that complex actually. Suggesting you have a look at Umbraco.Web.Standalone.StandaloneHttpContext. It's internal but you could use it as an inspiration to create your own HttpContextBase implementation. Then... That would give you a proper UmbracoContext in that detached thread.
Ok, a little detail with regard to the caching side, what we have a news item doc type that is used through out our site (as mentioned we are talking over 100,000 published nodes) that can be referenced from one or more news list pages and there are various rules regarding where and when they should be displayed i.e. the global news list should show links to news items with the most recent publish date (this isn't the umbraco publish date but a node datepicker property), label news should show links to recent news items from its news list plus news items from the labels artists, genre news should show any news links where the news items parent artist has this genre specified etc. This is too expensive to generate at runtime so at start up we get all news items and build List
Hope this explains a bit more what we are trying to do, building all these news caches takes about a minute or two and we can live with this delay in application start up at the moment as this is only configured to happen only on our editor site and not the public facing servers, in the web farm, which read the relevant caches from the RavenDb only when the news list pages are requested.
Oooh. Nice. So you're building some sort of content indexes, maintained in RavenDB. Have been doing something similar on a website that had a media library with complex rules on which media to include/exclude. So we compute the IDs of the medias upon the first request and cache that. But just the IDs, no content, and cached in-memory.
See my notes on UmbracoContext, that should help running stuff in a detached thread. Still working on the URL part to try to do something as clean as possible.
Great stuff, I think I did start looking at the EnsureContext method and ran into that exact issue with the HttpContext and stopped at that point. A bit tied up with some other stuff now but will definitely try and revisit that and try our your suggestion later in the week.
We did originally just store Ids in the cache and generate and process the nodes at runtime but this hasn't scaled too well so no we're trying to do all the heavy lifting up front. However talking about this has made me think that maybe we don't need to store the url in advance and maybe this could just be looked up at runtime without taking too much of a hit.
Thanks for all your help, this has been really useful and I look forward to seeing what you come up with regarding get the Urls form an IContent, this would be massively useful for us.
Hi Stephen,
I'm trying to get a url in a console app.. because there's no httpcontext I can't initialise the umbraco context. I was just wondering if you had a work around for getting a url from IContent in that kind of scenario?
Cheers,
Tom
@Tom: to make it short, no.
Two things here.
(a) To get the url of a content (IContent) you need to map it to an IPublishedContent and use the content cache. The url generation belongs to IPublishedContent. So you need a content cache in your console app. There's no way around it at the moment[1]. It's not impossible. In fact there's a Umbraco.Web.Standalone namespace that contains stuff to be used in order to create a console app that loads a content cache. And I have a prototype app that lists content + urls. BUT this is completely unstable hence internal.
[1] And... I'm not sure enabling url generation on IContent is a good idea. IContent relies on the DB and it's obviously slower than IPublishedContent. Working on top of the content cache makes much more sense IMHO.
So for console apps... can be done, but it's a matter of stabilizing the Standalone stuff. And... in fact at the moment anytime we add features to Core they sort of break Standalone by assuming that we run in a web app. Not going to be ready soon but it's something I'm working on, and if anybody wants to help...
(b) To get the url of a content (IContent) that is not yet in the content cache (eg when handling the Published event)... that's what triggers this dicussion... it's more complex. What if the parent's content is not in the cache either? What if nothing's in the cache yet? Easiest solution is to tell people to wait for the AfterUpdateDocumentCache event and only compute urls at that time. Not sure it's OK though.
Alternatively there's a "no cache" that's a work-in-progress. It's a content cache that bypasses the XML cache and directly hits the DB. If we could temporarily replace the current (XML) cache and use the "no cache", then we could compute urls before the XML cache is ready. BUT that would be quite slow.
OK, this turned into some sort of brain-dump... to make it short: the whole url generation process is intended to work on top of a content cache, for perfs purposes. Yet Andy's remark about "it should be simple to get the url of a content" is perfectly valid, and I find it a bit strange that when the Published event triggers, you still cannot get the matching IPublishedContent. Need more time to check how it all works and try to find a sensible answer.
Haven't looked at this in a couple of weeks as have been tied up with other stuff but now have a little time to spend on this again. Interesting reading your braindump Stephen, agree that a DB URL generation process will be slower and will have to be used with this in mind. Unfortunately I don't seen any other way of achieving this.
I have been using the legacy AfterUpdateDocumentCache event as a way to ensure that the xml for the node is in the cache and I can create the IPublishedContent to access the URL property in order to generate the URL alias but there a couple of issues here.
Firstly once I have generated and saved the url alias the process starts all over so I need to ensure when the AfterUpdateDocumentCache event gets called again this doesn't repeat the process ad infinitum. This isn't too difficult but (I think in terms of logical process and code, feel free to disagree) it would be preferable to do this in the Content.Saving event so that is can be processed and saved whilst suppressing future events.
Secondly, we have a requirement to support publishing through the timer, which runs in a background context which has no access to HttpContext of UmbracoContext so this just won't work at all here. As far as I can tell catching events thrown by the publishing timer currently means there is absolutely no way to get the URL, whatsoever, end of story, goodbye.
Would love to hear if you have any more thoughts on this or any progress has been made on accessing the URL with out so much dependency on UmbracoContext or an updated cache.
Andy
Hi guys,
I know this in a post from a while ago but I was having issues with this but it looks like this works:
This is returning the URL for items that previously had not been published while they're being published.
I hope this is helpful.
Hello again,
It looks like I've been mistaken in my eagerness and exuberance.
I was doing this in the Published event which I should not have been. Changing to Publishing event means the GetUrl method no longer works.
It would appear that previously I was unable to get the item as IPublishedContent but I WAS able to use GetUrl because although the item was not in the cache, it WAS published.
Back to the drawing board for me, but a good lesson learned :)
Sorry if I got anyone's hopes up!
Creating websites for clients where they need to know the URL before publishing of said pages (media with appropriate URL's need to go out before hand)
Would be an option to use it this way, instead of putting down URL aliases.
is working on a reply...