Site crashes randomly 100% CPU usage Umbraco 6.2.4
We have an umbraco site running 6.2.4 and the site randomly falls over. The W3WP process for the site runs at 100% CPU usage and any requests to the site gets a SQL Server initial handshake error.
Has anyone got any clues to resolving this problem? Or suggest where I can start looking?
Have you checked the Event viewer on the server? Also it might be a good idea to check the /App_data/logs file to see if it reveals anything marked with "ERROR" or "WARNING".
Has the site been running fine until now? Or has it just been released? If it has been running fine what changes have then been made? Is debug set to "false" in the web.config? Are there more than 1 connectionstring in the web.config if you're fetching data from other sources etc?
I've had a look through the logs and a couple of Errors are showing, but, these are just Null Reference Exceptions from a generic handler that's on the site. Apart from this, there's nothing obvious, also, the NRE's are showing in the Event Viewer as well.
Debug is turned off. The site is relatively new and has had problems ever since it's been under load.
Are you using XSLT or Razor for rendering the content? And is it generally slow or is it a certain couple of pages that are really slow? Are you using MVC or Webforms in the solution?
The site is using Webforms and there are quite a few User Controls that render the pages from content. Nothing appears to take any longer than it should.
Packages that we use on the site are uComponents & uTweets
Ok...no spikes that raises an alert? Something that takes way tooo long to load sometimes? Have you made any integrations up against a 3rd part somehow?
What are the server specs? How much CPU power and RAM?
Did you ever find the cause of this? We're getting the same issues on an Umbraco 7.1.8 installation. We have a load balanced site running 2 Windows Server 2008 R2 boxes. Each server has 2 CPUs and 3GB of Ram.
Hi - I've been getting this on a site that runs Umbraco 7 - I assumed it was my fault as it has some custom code for most pages - but on Friday a site that has only a small amount of that code and is currently on Umbraco 6 suddenly went to 100% with HTTP timeout errors from this site and SQL timeout errors from other sites. This took down the whole production server until I killed the process, so I'm eager to work out what the problem is... I've never seen this in development so am not holding out much hope of getting a reproduction easily :(
I only upgraded from U 4 to 6 a couple of months ago, and haven't made any structural changes since, so I think this didn't happen in 4.
One of the sites uses WebForms, and the other MVC. But there's a common custom component between them - which suggests it's not Umbraco's 'fault'. It's annoying how bad this error is, though, because assuming it's my code which has caused a deadlock in the database (or similar) - why would it take 100% CPU until the threads eventually time out? This means nothing else on the server (even basic Windows functionality!) can't get any processor cycles. And these aren't heavy traffic sites. Certainly the first few times I saw it was on a test server that only 2 people knew about - I've seen some nasty deadlocks in SQL before - but this is terrible! ;)
I've set it's CPU affinity so it should only kill 1 processor and associated site, but I need to work out what's going on and not sure where to start.
I didn't get a stack dump this time. Last time it happened in testing I did get a stack dump but can't see any reference to my custom library or even Umbraco - but I have no idea how to analyse dump files!
The server I've just seen this on is Server 2012 R2 - but the previous one was actually Windows 7 Pro! I'm not at all convinced this is the same issue - but I don't have INFO level logging as this is a production server/site.
I've tried turning on info level on our test site and can't see any weird CPU spikes or messages in the log when publishing there.
I also tried installing the patch from:
But got:
Starting installation of hotfix KB3052480 Checking for expiration of the hotfix ERROR: The test signed hotfix you are trying to install has expired. Please contact Microsoft Support to get a newer version.
Has this now been patched officially or something?
Well - I could uninstall KB3000850, but the better solution seemed to be the testing patch in http://issues.umbraco.org/issue/U4-6338 - but it's expired, apparently :(
@Ian - that's really odd, two of our Umbraco sites started exhibiting this behaviour on Friday, we had upgraded them to 7.2.5 earlier in the day so presumed it was that, however rolling one back to 7.2.4 hasn't helped.
They're both on Windows Server 2012 R2 platforms, IIS 8.5, 8-core @ 2.6GHz w/ 8GB RAM.
One of the sites is far less popular than the other and sits at around 35% CPU constantly - much higher than it should be - whereas the popular site slowly climbs to 100% from 3% over the course of 0.5 - 1 hour, then remains there. This seems to indicate that the issue can be exacerbated by higher traffic.
I was pointed in the direction of update KB3000850, however that update has now been uninstalled and the issue remains.
The rather odd thing is that we have a staging site which serves multiple Umbraco instances; it runs the same codebases that the live sites are, runs a nightly version of the dbs from the live sites and is also S2012R2, yet we don't see the issue at all.
The only error in the logs is regarding scheduled publishing:
Doesn't sound like the same thing as me. The CPU use is pretty predictable, and I've not seen this constant 'configuration changed - reloading' message. I have seen that error with scheduled publishing, however...
I've still got this happening. Been tearing my hair out for months. I've done a little more investigation and found that it could be a dictionary problem.
after having a look through what it produced when the process hangs, I look at the top 5 threds by CPU time and find that all of the .NET call stacks start with :
Not really sure what I am looking at here or where to find the problem. But on further investigation I find this:-
The MSDN documentation about Generic.Dictionary has the following information about the thread safety of Dictionary objects
A Dictionary can support multiple readers concurrently, as long as the collection is not modified. Even so, enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with write accesses, the collection must be locked during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
What is happening here, and causing the high CPU is that the FindEntry method walks through the dictionary, trying to find the key. If multiple threads are doing this at the same time, especially if the dictionary is modified in the meantime you may end up in an infinite loop in FindEntry causing the high CPU behavior and the process may hang.
So, Not really sure what my next step should be. Any ideas?
I have got the exact same issue as you CPU would jump to 100% at random times took a dump of w3wp and first 5 threads taking upto 10 minutes and showing the exact same stack trace.System.Collections.Generic.Dictionary2`
Did you ever get to the bottom of this?
I deleted couple of new created pages on Saturday morning republished the whole site and havent seen the issue ever since.
But not knowing exactly what caused it or if it has been fixed is really bothering me so just want to know why you get this error.
All is good, but introduces other issues when saving document types. As I said before, I will have to try and fix it at some point, but busy on other projects at the moment.
Site crashes randomly 100% CPU usage Umbraco 6.2.4
We have an umbraco site running 6.2.4 and the site randomly falls over. The W3WP process for the site runs at 100% CPU usage and any requests to the site gets a SQL Server initial handshake error.
Has anyone got any clues to resolving this problem? Or suggest where I can start looking?
Thanks in advance
Mark
Hi Mark
Have you checked the Event viewer on the server? Also it might be a good idea to check the /App_data/logs file to see if it reveals anything marked with "ERROR" or "WARNING".
Has the site been running fine until now? Or has it just been released? If it has been running fine what changes have then been made? Is debug set to "false" in the web.config? Are there more than 1 connectionstring in the web.config if you're fetching data from other sources etc?
/Jan
Hi Jan,
I've had a look through the logs and a couple of Errors are showing, but, these are just Null Reference Exceptions from a generic handler that's on the site. Apart from this, there's nothing obvious, also, the NRE's are showing in the Event Viewer as well.
Debug is turned off. The site is relatively new and has had problems ever since it's been under load.
Only 1 connection string.
Hi Mark
Are you using XSLT or Razor for rendering the content? And is it generally slow or is it a certain couple of pages that are really slow? Are you using MVC or Webforms in the solution?
/Jan
Any packages installed on the site?
The site is using Webforms and there are quite a few User Controls that render the pages from content. Nothing appears to take any longer than it should.
Packages that we use on the site are uComponents & uTweets
Hi Mark
Have you tried enabling UmbDebug and see if it gives some hints?
/Jan
Hi Jan,
I have, but it's so random that it's impossible to find. Even the UmbracoTraceLog files don't yield any clues!
Hi Mark
Ok...no spikes that raises an alert? Something that takes way tooo long to load sometimes? Have you made any integrations up against a 3rd part somehow?
What are the server specs? How much CPU power and RAM?
/Jan
Did you ever find the cause of this? We're getting the same issues on an Umbraco 7.1.8 installation. We have a load balanced site running 2 Windows Server 2008 R2 boxes. Each server has 2 CPUs and 3GB of Ram.
Thanks, Aileen
Hi - I've been getting this on a site that runs Umbraco 7 - I assumed it was my fault as it has some custom code for most pages - but on Friday a site that has only a small amount of that code and is currently on Umbraco 6 suddenly went to 100% with HTTP timeout errors from this site and SQL timeout errors from other sites. This took down the whole production server until I killed the process, so I'm eager to work out what the problem is... I've never seen this in development so am not holding out much hope of getting a reproduction easily :(
I only upgraded from U 4 to 6 a couple of months ago, and haven't made any structural changes since, so I think this didn't happen in 4.
One of the sites uses WebForms, and the other MVC. But there's a common custom component between them - which suggests it's not Umbraco's 'fault'. It's annoying how bad this error is, though, because assuming it's my code which has caused a deadlock in the database (or similar) - why would it take 100% CPU until the threads eventually time out? This means nothing else on the server (even basic Windows functionality!) can't get any processor cycles. And these aren't heavy traffic sites. Certainly the first few times I saw it was on a test server that only 2 people knew about - I've seen some nasty deadlocks in SQL before - but this is terrible! ;)
I've set it's CPU affinity so it should only kill 1 processor and associated site, but I need to work out what's going on and not sure where to start.
I didn't get a stack dump this time. Last time it happened in testing I did get a stack dump but can't see any reference to my custom library or even Umbraco - but I have no idea how to analyse dump files!
Hi guys,
we had a similar issue lately with 'random' CPU spikes. It turned out that it was because of Microsoft-update KB3000850 that was installed. See some other forum threads (https://our.umbraco.org/forum/core/general/63671-Application-shutdown-Reason-ConfigurationChange, http://issues.umbraco.org/issue/U4-6338, https://our.umbraco.org/forum/core/general/63922-Application-shutdown-Reason-various).
This happened on a Windows Server 2012 R2, so I don't know if it's related to your issues, but maybe it sets you on the right track.
Greetings, Jeffrey
The server I've just seen this on is Server 2012 R2 - but the previous one was actually Windows 7 Pro! I'm not at all convinced this is the same issue - but I don't have INFO level logging as this is a production server/site.
I've tried turning on info level on our test site and can't see any weird CPU spikes or messages in the log when publishing there.
I also tried installing the patch from:
But got:
Has this now been patched officially or something?
Hi Ian, the KB3000850 patch should NOT be installed, if I'm correct...
Well - I could uninstall KB3000850, but the better solution seemed to be the testing patch in http://issues.umbraco.org/issue/U4-6338 - but it's expired, apparently :(
@Ian - that's really odd, two of our Umbraco sites started exhibiting this behaviour on Friday, we had upgraded them to 7.2.5 earlier in the day so presumed it was that, however rolling one back to 7.2.4 hasn't helped.
They're both on Windows Server 2012 R2 platforms, IIS 8.5, 8-core @ 2.6GHz w/ 8GB RAM.
One of the sites is far less popular than the other and sits at around 35% CPU constantly - much higher than it should be - whereas the popular site slowly climbs to 100% from 3% over the course of 0.5 - 1 hour, then remains there. This seems to indicate that the issue can be exacerbated by higher traffic.
I was pointed in the direction of update KB3000850, however that update has now been uninstalled and the issue remains.
The rather odd thing is that we have a staging site which serves multiple Umbraco instances; it runs the same codebases that the live sites are, runs a nightly version of the dbs from the live sites and is also S2012R2, yet we don't see the issue at all.
The only error in the logs is regarding scheduled publishing:
Help would be greatly appreciated on this one as it's drastically affecting the performance of active sites.
Regards, Chris.
Doesn't sound like the same thing as me. The CPU use is pretty predictable, and I've not seen this constant 'configuration changed - reloading' message. I have seen that error with scheduled publishing, however...
No I don't have the configuration changed error either.
I've still got this happening. Been tearing my hair out for months. I've done a little more investigation and found that it could be a dictionary problem.
I did what it said in this post:-
http://www.iis.net/learn/troubleshoot/performance-issues/troubleshooting-high-cpu-in-an-iis-7x-application-pool
after having a look through what it produced when the process hangs, I look at the top 5 threds by CPU time and find that all of the .NET call stacks start with :
mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.Canon, mscorlib],[System.Boolean, mscorlib]].Insert(System.Canon, Boolean, Boolean)+1d7
Here's the complete call stack from one of the traces:
mscorlibni!System.Collections.Generic.Dictionary
2[[System.__Canon, mscorlib],[System.Boolean, mscorlib]].Insert(System.__Canon, Boolean, Boolean)+1d7 Umbraco.Core.Models.EntityBase.TracksChangesEntityBase.OnPropertyChanged(System.Reflection.PropertyInfo)+3e Umbraco.Core.Models.EntityBase.TracksChangesEntityBase.SetPropertyValueAndDetectChanges[[System.__Canon, mscorlib]](System.Func
2, System.Canon, System.Reflection.PropertyInfo)+5c Umbraco.Core.Models.ContentType.setAllowedTemplates(System.Collections.Generic.IEnumerable1)+80 [[DebuggerU2MCatchHandlerFrame]] [[HelperMethodFrame_PROTECTOBJ] (System.RuntimeMethodHandle.InvokeMethod)] System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Object[], System.Signature, Boolean) mscorlib_ni!System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(System.Object, System.Object[], System.Object[])+4c mscorlib_ni!System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)+d3 mscorlib_ni!System.Reflection.RuntimePropertyInfo.SetValue(System.Object, System.Object, System.Object[])+2c Umbraco.Core.Models.DeepCloneHelper.DeepCloneRefProperties(Umbraco.Core.Models.IDeepCloneable, Umbraco.Core.Models.IDeepCloneable)+61d Umbraco.Core.Models.ContentTypeCompositionBase.DeepClone()+3d Umbraco.Core.Models.ContentTypeCompositionBase.b__2f(Umbraco.Core.Models.IContentTypeComposition)+14 System_Core_ni!System.Linq.Enumerable+WhereSelectListIterator
2[[System.Canon, mscorlib],[System.Canon, mscorlib]].MoveNext()+f4 mscorlibni!System.Collections.Generic.List1[[System.__Canon, mscorlib]]..ctor(System.Collections.Generic.IEnumerable
1)+218 SystemCoreni!System.Linq.Enumerable.ToList[System.Canon, mscorlib]+50 Umbraco.Core.Models.ContentTypeCompositionBase.DeepClone()+1bb Umbraco.Core.Persistence.Caching.RuntimeCacheProvider.GetById(System.Type, System.Guid)+e3 Umbraco.Core.Persistence.Repositories.RepositoryBase2[[System.Int32, mscorlib],[System.Canon, mscorlib]].TryGetFromCache(Int32)+bb Umbraco.Core.Persistence.Repositories.RepositoryBase2[[System.Int32, mscorlib],[System.__Canon, mscorlib]].Get(Int32)+3c Umbraco.Core.Persistence.Repositories.ContentTypeRepository.PerformGet(Int32)+513 Umbraco.Core.Persistence.Repositories.RepositoryBase
2[[System.Int32, mscorlib],[System.Canon, mscorlib]].Get(Int32)+7b Umbraco.Core.Persistence.Repositories.ContentTypeRepository+df.MoveNext()+311 SystemCoreni!System.Linq.Enumerable+WhereEnumerableIterator1[[System.__Canon, mscorlib]].MoveNext()+c4 System_Core_ni!System.Linq.Enumerable.FirstOrDefault[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable
1)+f7 Umbraco.Core.Services.ContentTypeService.GetContentType(System.String)+213 Umbraco.Core.Models.PublishedContent.PublishedContentType.CreatePublishedContentType(Umbraco.Core.Models.PublishedItemType, System.String)+87 Umbraco.Core.Cache.CacheProviderExtensions+<>cDisplayClass91[[System.__Canon, mscorlib]].b__8()+f Umbraco.Core.Cache.StaticCacheProvider+<>c__DisplayClass1a.b__19(System.String)+f mscorlib_ni!System.Collections.Concurrent.ConcurrentDictionary
2[[System.Canon, mscorlib],[System.Canon, mscorlib]].GetOrAdd(System.Canon, System.Func2)+53 Umbraco.Core.Cache.StaticCacheProvider.GetCacheItem(System.String, System.Func
1)+62 Umbraco.Core.Cache.CacheProviderExtensions.GetCacheItem[[System.Canon, mscorlib]](Umbraco.Core.Cache.ICacheProvider, System.String, System.Func1)+b4 Umbraco.Core.Models.PublishedContent.PublishedContentType.Get(Umbraco.Core.Models.PublishedItemType, System.String)+e6 Umbraco.Web.PublishedCache.XmlPublishedCache.XmlPublishedContent.Initialize()+846 Umbraco.Web.PublishedCache.XmlPublishedCache.XmlPublishedContent.get_SortOrder()+1b System_Core_ni!System.Linq.EnumerableSorter
2[[System.Canon, mscorlib],[System.Int32, mscorlib]].ComputeKeys(System.Canon[], Int32)+93 SystemCoreni!System.Linq.EnumerableSorter1[[System.__Canon, mscorlib]].Sort(System.__Canon[], Int32)+26 System_Core_ni!System.Linq.OrderedEnumerable
1+d0[[System.Canon, mscorlib]].MoveNext()+152 SystemCoreni!System.Linq.Enumerable+WhereSelectEnumerableIterator2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].MoveNext()+a5 mscorlib_ni!System.Collections.Generic.List
1[[System.Canon, mscorlib]]..ctor(System.Collections.Generic.IEnumerable1)+218 System_Core_ni!System.Linq.Enumerable.ToList[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable
1)+50 umbraco.MacroEngines.DynamicBackingItem.getChildrenAsList()+52 umbraco.MacroEngines.DynamicNode.getGetChildrenAsList()+24 umbraco.MacroEngines.DynamicNode.getChildren()+9Not really sure what I am looking at here or where to find the problem. But on further investigation I find this:-
The MSDN documentation about Generic.Dictionary has the following information about the thread safety of Dictionary objects
A Dictionary can support multiple readers concurrently, as long as the collection is not modified. Even so, enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with write accesses, the collection must be locked during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
What is happening here, and causing the high CPU is that the FindEntry method walks through the dictionary, trying to find the key. If multiple threads are doing this at the same time, especially if the dictionary is modified in the meantime you may end up in an infinite loop in FindEntry causing the high CPU behavior and the process may hang.
So, Not really sure what my next step should be. Any ideas?
Hi Mark,
I have got the exact same issue as you CPU would jump to 100% at random times took a dump of w3wp and first 5 threads taking upto 10 minutes and showing the exact same stack trace.
System.Collections.Generic.Dictionary
2`Did you ever get to the bottom of this?
I deleted couple of new created pages on Saturday morning republished the whole site and havent seen the issue ever since.
But not knowing exactly what caused it or if it has been fixed is really bothering me so just want to know why you get this error.
Many thanks Ad
Hi,
I haven't got it completely resolved to date.
What I did do, was to reverse some fixes that went into 7.2.3 into 6.2.5.
That did fix the problem, but introduced some other issues when saving document types.
I will get around to it, but, just been too busy on other projects of late.
Mark
Hi Mark,
Thank you for your prompt reply!
I am using 7.2.2 and I am thinking of upgrading to 7.3 as a friend of mine recommended that.
For me deleting two newly created pages and republishing the whole site has kind off resolved this issue as I haven't seen it after that.
To be honest Umbraco is putting me off a bit with all these weird performance and other issues.
If I ever get to the bottom of this issue I will update you.
Many thanks Ad
7.2.8 is a bug-fix release and should be significantly easier to upgrade your instance to than the 7.3.0 Beta would be.
I'd recommend trying that first.
/Chris
Hi Chris,
Yes you are right it's been reported as a bug in UmbracoCMS in this post you might have to Google translate it
http://knowledge-base.havit.cz/2015/03/19/debugging-story-race-condition-v-umbraco-cms-deepclone/
And the fix has been explained in this Post.
http://issues.umbraco.org/issue/U4-6292
Mark perhaps good idea to look into these post as they explain the issues we both having.
Thank you Chris & Mark Ad
Hi,
I've read the post that was posted on the issues portal and implemented Shannons commits into 6.2.5.
Shannons commits:-
https://github.com/Umbraco/Umbraco-CMS/commit/8905878a8785a71742dbce7441d445bf44bd3d24
https://github.com/Umbraco/Umbraco-CMS/commit/834b780d8ee8bab3f0003a06ac0c00e8b660fcc5
https://github.com/Umbraco/Umbraco-CMS/commit/9a042fbbdf5c785356572c1a239f4c4cc63007c3
https://github.com/Umbraco/Umbraco-CMS/commit/46212904ef797f398aba16cd3ed6f97c2745b77a
All is good, but introduces other issues when saving document types. As I said before, I will have to try and fix it at some point, but busy on other projects at the moment.
Mark
is working on a reply...