We have a new version of a website to launch that will be popular. On a good day, there might be 3000 pageviews in an hour. We are doing our best to prepare for traffic spikes.
To best prepare, we have run 3 different stress tests.
Test 1: Umbraco caching on, macro caching off. Peaked at approximately 300 concurrent users before timeouts began to occur.
Test 2: Umbraco caching on, macro caching on. Peaked at approximately 300 concurrent users again.
Test 3: Tested the older static (non-umbraco) website. Peaked at approximately 680 concurrent users.
Can someone explain to me in top-level terms how content is retrieved and displayed to a user for a typical umbraco page? I'm also interested in how many database calls are required to display a typical umbraco page if caching is turned on. I'm also interested in why caching the macros did not have a performance impact.
From what I understand, if caching is enabled, it will pull content from the umbraco.config file, limiting the need for database interaction. Is this correct? If so, why was the static site performance over twice as good?
All content in the Umbraco content-section is stored in the umbraco.config. This file is held in memory and queryed against when umbraco serves a page. This should be super ultra mega fast! =)
Depending on what your macros are doing and how long the cache timeout this could have big impact on performance. Could you try with longer timeouts? You could add output caching as well.
Dynamic content vs. static content means a lot of difference. A request has to do down the ASP.NET pipeline, which takes some time.
The machine is on a VM cloud running 4 CPUs and 4 GB RAM. The data is coming from a SQL express database on that same server. I've been wondering if the SQL expess is leading to a negative performance impact, but with caching I wouldn't think it would be an issue.
Between test 1 and test 2, I set the macros cache period to 3600 seconds. I was pretty surprised when the second test ran and I didn't see a performance impact. The vast majority of macros are XSLT, using calls like GetXmlNodeById or use recursive selections for the menus.
The only macro that is a user control and is viewable across all pages is a very simple script that displays the year.
How does output caching differ from storing the XML data from umbraco.config in memory?
SQL Server Express could be the problem. But at the same time the cache-issue is interesting. I mean that turning on caching for the macros should have at least some impact. The "GetXmlNodeById"-metod talks to the XML-cache so that should be quite fast.
About output caching. When Umbraco serves a page thats not cached, it will process the XML data thats stored in memory (take very lite time, but it's done for each request). When turning on output caching with a setting of about 1-3 secounds. That could have big impact on a high traffic site becuse ASP.NET will process the page, cache it and serve the cached version to every request for x secounds, reset the cache, process once, do you see the difference?
I was aware of the 1GB memory limit, but as far as I had heard, SQL Express has the same connection limit as regular SQL. I can't find a definitive answer on this yet however. It's difficult to tell from the data I have if SQL express is causing the bottleneck. Sure it's limited to 1 GB memory, but if everything is cached, why does it matter?
I'm not sure why caching the macros didn't have a performance impact either. Each page has approximately 3-5 macros.
I'll definitely check out output caching. It sounds like it certainly wouldn't hurt. Thanks for this info.
As Markus says, the actual content comes from an in memory XML file (umbraco.config). The only time the front end should hit the database is if:
1) The content cache needs to be rebuilt - I THINK (although don't take this as gospel), that this happens on app start up, and is fired by some back office actions as well (such as sorting a list of pages)
2) If you have the XML cache disabled (you can do this in the umbracoSettings.config file).
3) If you've got some custom code on the site that makes database calls. Either your own, or a package such as Contour (which is unlikely to be on every page). Some of the tagging library methods make database calls as well, so if you're making extensive use of some of the tag library functions, that could be hitting the database.
For the static content stuff, static content will ALWAYS perform better than dynamic content, be it ASP.NET, PHP, Java, Ruby or whatever else you care to use. Why? Because when the web server gets a request for a static item (HTML page, image, whatever) that doesn't have a server-side processing element, it just checks if the file exists and throws it out to the user, and that's it. With dynamic content, there is an overhead of running the runtime/parser for the language, in addition to any other bottlenecks introduced by things liike database/file access, let alone things like inefficiently written code. For a .Net webforms page for example, the page lifecycle is pretty complex. Everything has to pass through the .Net pipeline, then all the controls must be loaded and rendered, until you eventually get some output HTML which is sent back to the browser. Dynamic pages using server side languages are powerful, and allow you to do an awful lot more than you can do with simple static pages, but that comes at a price in terms of server resource usage and request handling.
If you have something like DotTrace that allows you to run traces on the server when it's busy, you might be able to spot any major performance issues. It doesn't sound like your hardware should be a problem, it could be SQL Express, which in addition to the limits mentioned by Markus, will only use one of the processors (and only one of the cores if its a multi-core processor). To check if that's the issue, try running the stress tests on a dev box that has the full version of SQL Server on it (if you don't have it, you can always grab the 30 day trial and install that on a dev box). Then try stress testing with each database, and see if there is a noticeable difference in performance (as the front end content should be coming from the XML file, it shouldn't make too much difference, but its worth checking out anyway, just to rule it out). Another thing could be particularly inefficient XSLT in your Macros. If there's a Macro with some poorly optimised XSLT queries that could hamper the performance of the site. You can try disabling all of the macros on the site, test and see what kind of numbers you get, then slowly re-enable each macro and see if there's one in particular that causes problems.
You can also add umbDebugShowTrace=true to the querystring to get the full page trace so that you can see if there's a Macro that's taking a long time to run.
Yup, that all looks like it's loading nice and quickly, nothing leaps out as being a problem from that trace.
For your stress tests, are you hitting all of the pages on the site, or just a few? It could be that there's a particular page that's slowing things down (say if there's a long running query or complex task running), it might be worth testing a few pages at a time to see if that makes a difference?
If you google your server O/S and IIS versions looking for something like: "windows server 2008 maximise concurrent connections" should get you some useful information on optimisations that you can make to up the number of concurrent users too.
Another thing to bear in mind is that 3000 hits an hour, works out at about 50 hits a minute, or <1 hit a second. 300 concurrent requests is 300 requests all at the same instant in time, which unless you're getting crazy traffic, you probably won't get under normal operation. To be hitting 300 requests a second, you'd be looking at around a million hits an hour.
That said, it's still worth trying to squeeze as much performance as you can out of the hardware, I'd have thought you should be able to get more than 300 requests a second out of yuor hardware.
Performance and Caching
We have a new version of a website to launch that will be popular. On a good day, there might be 3000 pageviews in an hour. We are doing our best to prepare for traffic spikes.
To best prepare, we have run 3 different stress tests.
Can someone explain to me in top-level terms how content is retrieved and displayed to a user for a typical umbraco page? I'm also interested in how many database calls are required to display a typical umbraco page if caching is turned on. I'm also interested in why caching the macros did not have a performance impact.
From what I understand, if caching is enabled, it will pull content from the umbraco.config file, limiting the need for database interaction. Is this correct? If so, why was the static site performance over twice as good?
Hi Mike!
All content in the Umbraco content-section is stored in the umbraco.config. This file is held in memory and queryed against when umbraco serves a page. This should be super ultra mega fast! =)
Depending on what your macros are doing and how long the cache timeout this could have big impact on performance. Could you try with longer timeouts? You could add output caching as well.
Dynamic content vs. static content means a lot of difference. A request has to do down the ASP.NET pipeline, which takes some time.
What kind of machine are you running?
Hi Markus, thanks for the reply.
The machine is on a VM cloud running 4 CPUs and 4 GB RAM. The data is coming from a SQL express database on that same server. I've been wondering if the SQL expess is leading to a negative performance impact, but with caching I wouldn't think it would be an issue.
Between test 1 and test 2, I set the macros cache period to 3600 seconds. I was pretty surprised when the second test ran and I didn't see a performance impact. The vast majority of macros are XSLT, using calls like GetXmlNodeById or use recursive selections for the menus.
The only macro that is a user control and is viewable across all pages is a very simple script that displays the year.
How does output caching differ from storing the XML data from umbraco.config in memory?
Thanks again, this is all great info.
Hi again Mike!
SQL Server Express is limited to 1GB of ram and it also limitation on the number of concurrent connections to the database. I thinks 1-2 open connections, not sure. More about this here: http://stackoverflow.com/questions/59080/sqlserver-express-slow-performance and here http://social.msdn.microsoft.com/Forums/en/sqlexpress/thread/baad0518-bffe-46e6-867b-22ee0ac4c867
SQL Server Express could be the problem. But at the same time the cache-issue is interesting. I mean that turning on caching for the macros should have at least some impact. The "GetXmlNodeById"-metod talks to the XML-cache so that should be quite fast.
About output caching. When Umbraco serves a page thats not cached, it will process the XML data thats stored in memory (take very lite time, but it's done for each request). When turning on output caching with a setting of about 1-3 secounds. That could have big impact on a high traffic site becuse ASP.NET will process the page, cache it and serve the cached version to every request for x secounds, reset the cache, process once, do you see the difference?
Thanks again for responding Markus,
I was aware of the 1GB memory limit, but as far as I had heard, SQL Express has the same connection limit as regular SQL. I can't find a definitive answer on this yet however. It's difficult to tell from the data I have if SQL express is causing the bottleneck. Sure it's limited to 1 GB memory, but if everything is cached, why does it matter?
I'm not sure why caching the macros didn't have a performance impact either. Each page has approximately 3-5 macros.
I'll definitely check out output caching. It sounds like it certainly wouldn't hurt. Thanks for this info.
Hi Mike,
As Markus says, the actual content comes from an in memory XML file (umbraco.config). The only time the front end should hit the database is if:
1) The content cache needs to be rebuilt - I THINK (although don't take this as gospel), that this happens on app start up, and is fired by some back office actions as well (such as sorting a list of pages)
2) If you have the XML cache disabled (you can do this in the umbracoSettings.config file).
3) If you've got some custom code on the site that makes database calls. Either your own, or a package such as Contour (which is unlikely to be on every page). Some of the tagging library methods make database calls as well, so if you're making extensive use of some of the tag library functions, that could be hitting the database.
I'm surprised that caching isn't making any difference for you, on the sites that we've tested on it seems to increase the amount of concurrent requests. What version of Umbraco are you running? There are some issues to do with Macro caching in some of the more recent versions that are being fixed at the moment. There's a pull request with a fix here: http://umbraco.codeplex.com/SourceControl/network/Forks/florisrobbemont/MacroCachingHeavyLoad/contribution/1830 you can see the (very long) thread about it in the forum here: http://our.umbraco.org/forum/developers/api-questions/8584-Severe-Issue-with-macro-caching-under-heavy-load
For the static content stuff, static content will ALWAYS perform better than dynamic content, be it ASP.NET, PHP, Java, Ruby or whatever else you care to use. Why? Because when the web server gets a request for a static item (HTML page, image, whatever) that doesn't have a server-side processing element, it just checks if the file exists and throws it out to the user, and that's it. With dynamic content, there is an overhead of running the runtime/parser for the language, in addition to any other bottlenecks introduced by things liike database/file access, let alone things like inefficiently written code. For a .Net webforms page for example, the page lifecycle is pretty complex. Everything has to pass through the .Net pipeline, then all the controls must be loaded and rendered, until you eventually get some output HTML which is sent back to the browser. Dynamic pages using server side languages are powerful, and allow you to do an awful lot more than you can do with simple static pages, but that comes at a price in terms of server resource usage and request handling.
If you have something like DotTrace that allows you to run traces on the server when it's busy, you might be able to spot any major performance issues. It doesn't sound like your hardware should be a problem, it could be SQL Express, which in addition to the limits mentioned by Markus, will only use one of the processors (and only one of the cores if its a multi-core processor). To check if that's the issue, try running the stress tests on a dev box that has the full version of SQL Server on it (if you don't have it, you can always grab the 30 day trial and install that on a dev box). Then try stress testing with each database, and see if there is a noticeable difference in performance (as the front end content should be coming from the XML file, it shouldn't make too much difference, but its worth checking out anyway, just to rule it out). Another thing could be particularly inefficient XSLT in your Macros. If there's a Macro with some poorly optimised XSLT queries that could hamper the performance of the site. You can try disabling all of the macros on the site, test and see what kind of numbers you get, then slowly re-enable each macro and see if there's one in particular that causes problems.
You can also add umbDebugShowTrace=true to the querystring to get the full page trace so that you can see if there's a Macro that's taking a long time to run.
Hope that helps!
:)
Thanks Tim, this is all excellent info. I'm running a pretty recent version: umbraco v 4.7.1.1 (Assembly version: 1.0.4393.24044)
I've listed the trace information below, but as far as I can tell everything is loading very quickly. Is this your opinion as well?
Trace Information
Hi Mike,
Yup, that all looks like it's loading nice and quickly, nothing leaps out as being a problem from that trace.
For your stress tests, are you hitting all of the pages on the site, or just a few? It could be that there's a particular page that's slowing things down (say if there's a long running query or complex task running), it might be worth testing a few pages at a time to see if that makes a difference?
If you google your server O/S and IIS versions looking for something like: "windows server 2008 maximise concurrent connections" should get you some useful information on optimisations that you can make to up the number of concurrent users too.
Another thing to bear in mind is that 3000 hits an hour, works out at about 50 hits a minute, or <1 hit a second. 300 concurrent requests is 300 requests all at the same instant in time, which unless you're getting crazy traffic, you probably won't get under normal operation. To be hitting 300 requests a second, you'd be looking at around a million hits an hour.
That said, it's still worth trying to squeeze as much performance as you can out of the hardware, I'd have thought you should be able to get more than 300 requests a second out of yuor hardware.
is working on a reply...