Press Ctrl / CMD + C to copy this to your clipboard.
This post will be reported to the moderators as potential spam to be looked at
I've read theough a number of threads here about the distributed calls, how they work and how Examine is updated on the distributed servers via the AfterUpdateDocumentCache.
My issue is that in our load balanced environment we have no direct http access to the individual load balanced boxes, so we cannot use the distributed calls approach outlined in the Umbraco documentation. We only have access to the "main" authoring box that sits behind the firewall but is obviously hooked into the same database. We are using DFS to copy files, etc across to the other boxes, so we do have that level of "access".
Without the distributed call approach being available to us, my approach so far (PoC) is to call the following every 5 minutes:ApplicationContext.Current.ApplicationCache.RuntimeCache.ClearAllCache();(I was calling umbraco.library.RefreshContent() - both seem to work)
I call the above on app startup and then every 5 minutes for each server (on 0 and 5 minute "times"). Before calling the above I grab the latest updateDate from cmsDocument where published=1 and compare to a static DatTimeI store so that I'm not un-necessarily updating the cache if nothing has actually changed.
The above works just fine and the Umbraco cache refreshes itself every 5 minutes on the "really dumb front end servers". It's also super quick so am assuming that there are some "smarts" behind the above calls that only updates the changed items in the cache.
What doesnt work is updating the Examine indexes on each of the "really dumb front end servers". I understand that the distributed publishing calls usually look after this, so I know why it doesnt work, but I need some help in figuring out an architecture to make it work. I'm assuming there is no magic bullet where I can call a single Umbraco method to magically update Examine (without having to re-index the whole thing).
Maybe I can capture in a custom db table the same thing that the distributed calls send out (not entirely sure what they do send). And then on my "5 minute scheduled task" I look in this custom table for any new distributed calls and then manually "run them" to ensure examine also gets updated. The issue is that I am not sure how to hook into the distributed call event to "log" that data. Updating Examine by nodeid I can likely handle based on other examples in the forums. Also need to try and work out if the above will work when/if a new "dumb front end" comes online.
Any other ideas or issues with the above architecture?
As some other background the site is a 30,000 node site with a 60Mb (currently) umbraco.config file.
First thing is
is not the same as
You should be careful about when you are calling this: ApplicationContext.Current.ApplicationCache.RuntimeCache.ClearAllCache
as this is basically synonymous as clearing the entire HttpRuntime cache which means that you are effectively removing all runtime cache which will cause anything that uses it (which is quite a few things) to have to go re-fetch the raw data.
This call: umbraco.library.RefreshContent() makes a distributed call to all servers, then on each server this will go rebuild the entire umbraco xml cache file based on what is in the database. This is also quite an intensive call and will trigger this to occur on every server taking part in your LB scheme. If you say that this server cannot make distributed calls to the other boxes then I can only assume that this xml cache file rebuild will only occur on the local machine - but you'd have to verify that. If you have many documents, this is an intensive call and should definitely not occur every 5 minutes.
Perhaps a better idea rather than scheduling a rebuild of the xml cache every 5 minutes is to use DFS to your advantage and create a file queue of publishing events. On your main publishing/admin server whenever a doc is published or unpublished you could write a file, this file would get synced to all servers. Then on each server you can poll for new changes that haven't been processed locally. Each file could just contain an Id and if it is new/updated/deleted. Then handle it appropriately. Or you could just create another database table to handle this queue.
If a new server comes online, you'll have to rebuild all indexes locally to that machine, same goes for the xml file. It will be tricky to ensure this server is precisely up-to-date though if you are bringing servers online dynamically with active editors in the back office. Perhaps based on when it comes online and when the rebuild process has started/completed (for both examine and the xml file), you could then check the queue and see if there's been any activity from when the rebuild process started, if there is you could just execute the last 'x' items in the queue which should in theory make sure your local environment is up-to-date.
Just a thought: You might just be able to use the existing data in the umbracoLog table for this stuff, but would need to verify that all the info you'd need is put in there, otherwise you could probably just add your own information to this table for the queue to work.
You are correct that we cannot do distributed calls - the db based approach seems like the best one and good thought on using the log file as a potential "already there" solution. I was hoping for a built-in Umbraco approach to all this, but looks like I have to hook up all the plumbing myself to get this to work but at least the connections/events are available. Is there anything to gain from hooking into the distributed calls somehow even though they wouldn't go to a physical server (capturing them and putting them in a db table) over just hooking into the normal publish/unpublish events?
Base on the above it seems the psuedo-code is:
We have a couple of internal interfaces that haven't been made public yet: IServerMessenger and IServerRegistrar. If you wanted to live on the edge, you could use a custom build based on the stable version you are using and make these public. The IServerMessenger is the thing that performs the distributed calls to each server. You could create a custom version of that to do what you want but it'll take a bit of work. If none of your front-end servers receive distributed calls via Http there may be other cache invalidation issues other than just content that may affect them as well - depending on what you are doing on the admin server so it may be worth going down this route.
You could look into the code of DefaultServerMessenger and swap out the http calls with calls to add items to your 'queue'. In your app startup code you'd need to change the current IServerMessenger:
this needs to be done in your ApplicationEventHandler.OnApplicationStarting method.
In v6.2+ the library.UpdateDocumentCache(id) just calls this underneath:
This makes the distributed calls which will then cause the PageCacheRefresher to execute on each server which calls this code:
Thanks for the suggestions - I think I have enough to go on (after making some decisions). I took a look at the umbracoLog and it looks like the most promising thing so far (yet to look at media though). The code examples above (and then doing search of the core code) ehlps a lot!
Where is the "similar" code on the distributed servers that takes an id and refreshes that one examine doc?
Media isn't properly done for examine (yet), there's an outstanding issue here:
For content, this is done by subscribing to these events:
content.AfterUpdateDocumentCache += ContentAfterUpdateDocumentCache;
content.AfterClearDocumentCache += ContentAfterClearDocumentCache;
These are older events that actually fire on each server after a distributed call but should be changed over to use this event: CacheRefresherBase
There's a thread about this here:
If you have a look into the Umbraco.Web.Search.ExamineEvents you can see how it is done
Perfect - exactly what I was looking for (had seen that thread previously too). Info above and the other thread should get us to a solution!
Great! Would be interested to see the outcome, we've discussed having distributed calls and examine indexes updated via a queue for LB circumstances such as this. It also allows for easier additional and removals of servers in real time (another reason why IServerRegistrar was created). Of course we haven't pursued this idea yet as there been no time but I think the foundations are there so that it could work. Good luck!
I will be sure to report back on a solution (or failure :-)). Will buy you a beer in June!
My solution is getting close with all the hard work done - I have an almost working PoC on Azure Websites that scales out and does the distributed calls to update the Umbraco Cache and Examine.
To finish off the PoC and release to the community I need a couple of things:
You can create your own instance of Umbraco.Web.UmbracoApplication and change your global.asax to inherit from that. Then in your instance you can override OnApplicationEnd - though I'm not sure what services will actually be available during that event.
On some other notes. This issue is completed: http://issues.umbraco.org/issue/U4-3937
You'll also be pleased to see this rev which we've publicized IServerMessenger and IServerRegistrar:
You'll also see this new branch:
I'll be writing up a blog post about this and what it does soon but here's the gist:
Cool! - a few questions:
With this new setup, there is no notion of server registrations, the servers just need access to the database and they poll when necessary to process the instructions stored there. You can add/remove servers all you want, none of them need to know anything about the other ones.
The batching stuff is the fix is for this issue:
which is an enhancement to how LB is currently working and severely limits the amount of chatter between servers especially when dealing with permissions. Basically instead of sending multiple instructions it just batches them in one request if there were multiple distributed calls required during a single request.
is your batchdistcalls solution on the roadmap to be included in into the core ?
We're looking for hosting our website on azure in a few months
and are in need of loadbalancing.
Thanks for the feedback !
yeah, there's code in a custom branch here:
The batching of distributed calls is working and is the first commit in that branch, the rest is a proof of concept which requires a lot more testing. Something I'll be getting around to on Fridays if I have time.
By "Azure" what do you mean? Azure Websites (WAWS) or normal Azure (VMs) ? We do not currently support Azure Websites for load balancing. There's been plenty of discussion around this, some of which is on the google groups dev mail list. There's some barriers with WAWS that prevent Umbraco from running OOTB with load balancing. We have some POCs working for it but these require testing. One of these POCs is this git branch.
I meant Azure Websites. I thought the batching of distributed calls was also targeting the WAWS. Didnt look into the details yet, I assumed that running VM's would have the same setup as non-azure hosting...
Here's the barriers to WAWS:
So OOTB umbraco will not work with load balancing on Azure websites but as mentioned there are a few POCs that I've created that seem to work but require more testing and implementation.
Here's some more light reading around the subject:
I plan on writing a detailed blog post about all of this soon.
Great feedback, tnx !
is working on a reply...
Write your reply to:
Image will be uploaded when post is submitted