Unexplained Spike in IIS Server CPU Activity (Nearly 100%) Causing Outages
We have had a series of high CPU events causing our server and all our sites to time out. These server outages range from 5-15 minutes, with pages sporadically becoming available during this interval. In some cases, the problems go away on their own; in others, we must restart the IIS. When these events occur, the CPU value it IIS spikes to 95-99.99x seconds.
We initially identified the extensive logs Umbraco keeps as well as the versions it stores of nodes that have been changes is related to the outages. We now prune this every two weeks to keep their footprint down. In this case, our IT department initiated s support incident with Microsoft to capture the exact state of the IIS server using some elaborate utilities so determine what exactly is causing the high CPU events.
In term of hardware we have 8 high performance CPUS and generous memory and disk space in 4 VMs running in a state-of-the-art Dell run server farm. No sooner was a system in place to analyze the outages in real time, the outages stopped. The Umbraco SQL dB lives on a different server. We have our own error warning notification system and all the errors are attributable to the IIS, not the db.
After emptying the recycle bins of both recycle bins, the high CPU events stopped. Perhaps a coincidence, but more likely this action had an effect.
After emptying the media library recycle bin, the server file footprint of the \media directory is 1.6 gb with 89,000 files representing nodes, most likely including multiple versions of the same file.
In contrast, according the widely-used Umbraco utility, Richard Soetemann's "Bulk Manager", the Media Library in Umbraco has only 6,306 files. I don’t know the average size, but at 100K each this would translate to about 600 mb. There are 15x as many files on the disk vs the count in the media, derived from Bulk Manager, or 1,500% if my math is right.
Does anything about the above, most notably the media library status, suggest a potential source for our outages.
We are in the process of upgrading from 4.11 to the last production version of Umbraco 6, with an eye to moving to Umbraco 7 as soon as Umbraco 6 is stable.
I have seen similar and it's a real pain to track down what exactly is causing the issue, so I'll just list every solution I know of:
Update Windows. Including the "Optional" (or recommended or whatever they're called) updates. There was some update that fixes an issue with how files are watched or something of that sort. This works a large percent of the time.
Don't use Contour. There was a site that periodically has database connectivity issues, usually starting with an entry in the Umbraco log indicating Contour can't connect to the database, possibly due to the connection pool filling up. One reason Contour may be causing issues is that it interacts with the database poorly (e.g., I had one site in which it would make 3,200 SQL queries on a single form submission).
Upgrade Umbraco. I was experiencing a CPU spike and the very latest version of Umbraco 6 (6.2.6) contains a single fix for that particular issue.
Store image thumbnail outside of web root. This is only really possible with the latest 7.x version of Umbraco. If too many media thumbnails are generated, it can cause the website to restart when the file change buffer fills up.
Set fcnMode="Disabled" in the web.config. This is a workaround so the file change buffer doesn't fill up.
@Nicholas - Thank you! re: Contour, I agree, it's got a lot of problems and is really rough around the edges. However we are not having database connectivity issues. In any case, is it's successor, Umbraco Forms, any more reilable.
We're on Windows Server 8 R2 - any potential benefit from going to Windows Server 12?
We're definately upgrading Umbraco.
I am not an IT person nor a programmer. What would "Set fcnMode="Disabled" in the web.config" do and how might that mode be causing the outages.
re: Umbraco 6 (6.2.6) contains a single fix for that particular issue.
In any case, is it's successor, Umbraco Forms, any more reilable.
Probably "more" reliable, but I'm not sure if I'd call it reliable. I have been using alternatives to Contour/Umbraco Forms for some time. I do know that there are issues I reported that have not been marked as fixed.
Note that I am very much biased in this regard (I have a bit of history with Contour), so it might be worth asking this of others so they can provide a more objective opinion.
We're on Windows Server 8 R2 - any potential benefit from going to Windows Server 12?
Probably not, but you could give it a go. Just be sure to apply all Windows updates.
What would "Set fcnMode="Disabled" in the web.config" do and how might that mode be causing the outages.
In short, FCN stands for "File Change Notification". The website is supposed to restart when certain files change (e.g., DLL's, CSHTML files). This is because a change to those files implies a functional change, which gets initiated by a website restart. Problem with this is that it stores information about the files in a buffer in memory of limited size. When there are too many files (and folders) in that buffer, the website restarts to clear out the buffer.
Note that there is a caveat to setting this to disabled. After you deploy changes to the website, you may need to manually recycle the application pool to ensure your changes take effect right away.
Here's where you make that change in the web.config:
Unexplained Spike in IIS Server CPU Activity (Nearly 100%) Causing Outages
We have had a series of high CPU events causing our server and all our sites to time out. These server outages range from 5-15 minutes, with pages sporadically becoming available during this interval. In some cases, the problems go away on their own; in others, we must restart the IIS. When these events occur, the CPU value it IIS spikes to 95-99.99x seconds.
We initially identified the extensive logs Umbraco keeps as well as the versions it stores of nodes that have been changes is related to the outages. We now prune this every two weeks to keep their footprint down. In this case, our IT department initiated s support incident with Microsoft to capture the exact state of the IIS server using some elaborate utilities so determine what exactly is causing the high CPU events.
In term of hardware we have 8 high performance CPUS and generous memory and disk space in 4 VMs running in a state-of-the-art Dell run server farm. No sooner was a system in place to analyze the outages in real time, the outages stopped. The Umbraco SQL dB lives on a different server. We have our own error warning notification system and all the errors are attributable to the IIS, not the db.
After emptying the recycle bins of both recycle bins, the high CPU events stopped. Perhaps a coincidence, but more likely this action had an effect.
After emptying the media library recycle bin, the server file footprint of the \media directory is 1.6 gb with 89,000 files representing nodes, most likely including multiple versions of the same file.
In contrast, according the widely-used Umbraco utility, Richard Soetemann's "Bulk Manager", the Media Library in Umbraco has only 6,306 files. I don’t know the average size, but at 100K each this would translate to about 600 mb. There are 15x as many files on the disk vs the count in the media, derived from Bulk Manager, or 1,500% if my math is right.
Does anything about the above, most notably the media library status, suggest a potential source for our outages.
We are in the process of upgrading from 4.11 to the last production version of Umbraco 6, with an eye to moving to Umbraco 7 as soon as Umbraco 6 is stable.
Appreciate any insights the forum can provide.
I have seen similar and it's a real pain to track down what exactly is causing the issue, so I'll just list every solution I know of:
Hopefully one of those will work.
@Nicholas - Thank you! re: Contour, I agree, it's got a lot of problems and is really rough around the edges. However we are not having database connectivity issues. In any case, is it's successor, Umbraco Forms, any more reilable.
We're on Windows Server 8 R2 - any potential benefit from going to Windows Server 12?
We're definately upgrading Umbraco.
I am not an IT person nor a programmer. What would "Set fcnMode="Disabled" in the web.config" do and how might that mode be causing the outages.
re: Umbraco 6 (6.2.6) contains a single fix for that particular issue.
What is that single fix?
Thanks.
Probably "more" reliable, but I'm not sure if I'd call it reliable. I have been using alternatives to Contour/Umbraco Forms for some time. I do know that there are issues I reported that have not been marked as fixed.
Note that I am very much biased in this regard (I have a bit of history with Contour), so it might be worth asking this of others so they can provide a more objective opinion.
Probably not, but you could give it a go. Just be sure to apply all Windows updates.
This explains that a bit: https://shazwazza.com/post/all-about-aspnet-file-change-notification-fcn/
In short, FCN stands for "File Change Notification". The website is supposed to restart when certain files change (e.g., DLL's, CSHTML files). This is because a change to those files implies a functional change, which gets initiated by a website restart. Problem with this is that it stores information about the files in a buffer in memory of limited size. When there are too many files (and folders) in that buffer, the website restarts to clear out the buffer.
Note that there is a caveat to setting this to disabled. After you deploy changes to the website, you may need to manually recycle the application pool to ensure your changes take effect right away.
Here's where you make that change in the web.config:
FYI, you can see the issues addressed in a particular version of Umbraco here: https://our.umbraco.org/contribute/releases/626/
Thanks again. What do you use in place of Contour/Umbraco forms?
Rather than point you to a specific solution, I'll just link to this, which lists some alternatives: https://github.com/leekelleher/awesome-umbraco
is working on a reply...