75-100 percent cpu utilization spikes with single site visitor
Third time is the charm. the % in the title made it into the post url, and browsers weren't allowing navigation to the post. Should probably fix that :)
Sorry if this is a dup. I waited 15 minutes for my other post to show up and it hasn't yet. Bummer. In any case, this post has a little more info.
Umbraco 4.0.3
Blog4Umbraco 2.0.24
Windows Server 2008
SQL Server 2008 (separate machine)
I recently (yesterday) imported 500+ blog posts and 2000+ comments from community server. I'll have several hundred more pages to go before the site is fully stocked. I had planned to go live with the blog this week, but this perf issue is a showstopper.
When hitting a single page, I'm seeing CPU spikes from 50% to 100%. That's loading the server, and making individual pages take several seconds to load. For example, this page had a 4 second CPU spike of 100%.
Once initially loaded, it's good for several minutes (presumably whatever the cache time is)
Most of my customization has been limited to skinning/templates. I've modified some xslt, but mainly just to change styles and layout. xml caching is turned on in the config. I have 500+ entries in the urlrewriting config file to map the old community server urls to the new umbraco ones.
I'm not running the image resizer (saw in other posts that it had a memory leak), and I'm not seeing anything that makes sense in the debug trace. Memory usage is pretty consistent at around 350mb. I see no errors in the windows event log.
The server (older dual proc xeon 2.4ghz) runs my blog under community server with CPU usage typically in the < 3% range, so the machine is capable.
Not sure if this is the problem, or a red herring, but I see a bunch of these in the umbraco log table:
Error removing node from umbraco index: 'System.IO.IOException: No sub-file with id _4.f8 found at Lucene.Net.Index.CompoundFileReader.OpenInput(String id) at Lucene.Net.Index.SegmentReader.OpenNorms(Directory cfsDir) at Lucene.Net.Index.SegmentReader.Initialize(SegmentInfo si) at Lucene.Net.Index.SegmentReader.Get(Directory dir, SegmentInfo si, SegmentInfos sis, Boolean closeDir, Boolean ownDir) at Lucene.Net.Index.IndexReader.AnonymousClassWith.DoBody() at Lucene.Net.Store.Lock.With.Run() at Lucene.Net.Index.IndexReader.Open(Directory directory, Boolean closeDirectory) at Lucene.Net.Index.IndexReader.Open(String path) at umbraco.cms.businesslogic.index.Indexer.RemoveNode(Int32 Id)'
I'd like to rebuild the index, but can't tell where the files are that I'd have to delete.
Any suggestions how to debug the larger perf issue?
I'm reindexing that now. Despite hitting "refresh", I'm not seeing any actual documents getting indexed (unless it really takes forever to calculate the number of documents), so maybe I have a problem with Lucene.
0 of Not calculated yet documents indexed. Currently indexing ''
I do see files getting added and removed in that index folder, though, so maybe it's just the page having issues.
As an aside, that reindexing page seems like a bad page to have laying open. Is that using umbraco security or is it just open to the public?
Hi Pete, the indexing page requires the user to be logged in to the umbraco backend (maybe even as admin - I'm not quite sure) - if the user is not logged in, he is redirected to the login page.
Ok, looks like the problem is the Related Posts xslt that comes with Blog4Umbraco. It is enumerating a huge number of blog posts, first getting all that have any of the same tags (using tagslib:getContentWithTags) and then sorting them by most tag matches and then createdate.
Does anyone have a more performant one, or suggestions for improving it? I was actually looking forward to finally having that on my blog, but not at this price :)
I think I actually did that code. But I did not optimize it for that large an amount of posts. It is probably the weighting part that is hurting it most. Sorry about that :)
Maybe it would be better to do the entire weighting in one clean sql query?
Ok, so the problem isn't with any one xslt, page or template. I may misdiagnosed earlier with the related posts one, as it looked like perf got better when I removed it and worse when I added it. (sorry if perf is ok in that. my mistake)
The performance issues are random and pretty universal. I'll hit a dog simple page, and it'll take 5-20 seconds to load up, while my server CPU is spiking. Complex pages, same thing. Sometimes they're really quick, sometimes slow. Sometimes the server chokes and I have to restart the app pool.
This is not right after restarting the app - it's far far worse then. The site is unavailable for almost a minute while umbraco starts up, and then individual pages are slow to come up.
I've rebuilt the indexes. I've also looked at the umbraco.config xml disk cache, and it seems ok.
Umbraco is running in its own dedicated app pool. Permissions appear to be correct (all the right files get updated)
I currently have all cache settings in the xslts set to 0 (not sure if that means to cache forever or not at all), but I want to get raw perf addressed before I start tweaking caching.
Suggestions on what to look for here? I've seen several other perf posts in the forums here that sound like mine, but they tend to just stay unresolved.
Pagedata loaded for Wallpaper (ID: 4851, Version: 636fc74b-11fb-4939-bb98-b72875ea15cc)
0.000924698530120448
0.000113
aspx.page
End PreInit
1.00078920644942
0.999865
it took several seconds to render, but it's only one second. The home page took several seconds to render in the browser, but I checked the trace, and end render happened pretty quickly
aspx.page
End Render
0.11565911309957
0.002032
This one took about 15 seconds to render http://10rem.net/blog/?umbdebugshowtrace=true , but the debug trace shows it should be much faster (whole page completed in 0.25 seconds according to the trace)
umbracoMacro
Before performing transformation
0.00741435014785399
0.000043
umbracoMacro
After performing transformation
0.223718377615032
0.216304
...
aspx.page
End Render
0.245974558218992
0.003142
This one took 17 seconds to render, and showed spikes on the server lasting several seconds:
aspx.page
End Render
0.157448502533143
0.018587
So, it seems like it is something outside the pipeline? However, the other sites on the same box are really fast.
I'm going to keep evaluating and see if anything sticks out. It's very random and hard to track.
For grins, I took the 500+ url rewrite rules out of the url rewriting config to see if that's it. I bounced the site (added a blank line to web.config). Initial startup was slow, but it seems like the pages are all coming back quickly. I don't want to assume that's the problem (I keep chasing the wrong things here) but looks promising.
If the url rewriter is the problem, I'll need to find another way to redirect all my old blog posts and site pages (eventually will have several hundred more rules to add) to the new urls in umbraco. Maybe keep those domains up and put the rewriting there instead.
Also, FWIW, my worker process for the umbraco site (even though recently bounced and site isn't getting any traffic except my tests and maybe you guys), total cpu time is already at 0:30:29 whereas my blog, which I haven't bounced in ages is at 0:04:54. Both are on the same server (different app pools), both use the same back-end database server. It's also using over 2x the memory (264,832k vs. community server at 106,424k), but I'm less concerned about that.
So the server, iis, asp.net, disk, memory etc. are all working fine.
Does your site run in it's own application-pool? If not, try putting it in its own application pool.
Do you also have access to the event-logs of windows? Could be that there's some errror there which causes the pool to recycle (thus restarting the app, thus a slow initial request)
I noticed 18 seconds on the lab-page as well, only on the first request however. I could not notice anything strange in the trace.
You could also try to have a look at the umbracoLog-table. See if there's any errors in there.
You say you use a named account. Do you mean an account from Active Directory? If so, try to change it to Network Service. That's a local account and that way you rule out any network-hiccups to the domain controller.
Do you use a SQL-login to connect to the DB or an active directory one? Do you have slow responses in the backend as well or does it only occur at the front-end?
Do you use a lot of media-items? (From what I've seen you don't, but perhaphs you have some GetMedia things in a few macros?)
I have a couple media items (the book image on the right on the blog and about pages, the two images on the lab page), but nothing else. My old blog posts were all imported with their images left external. New blog posts will have images in the media library.
I use a SQL login to connect to the database server.
Database server peaks at about 7% utilization, usually around 1-2%
I temporarily changed the account to local system to try out perf. Some pages were fast, but then I came across the strange slowdown again. About 15-20 seconds to load one of my blog pages, and cpu spiked.
Just changed back to the named account and hit a page on the site. DB server barely sees any traffic (stayed around 1% the entire time), so that's not the bottleneck. I'd have expected to see it in the trace if it were, but you never know.
Hmm. I've had the umbraco admin console up in the background this whole time. I wasn't doing anything in it during the tests. Does it auto-refresh or anything crazy like that?
I shut down the console and forced a site restart to get cache out of the picture. Once it was back up from the bounce (really wish that didn't take so long) site is screaming. I can't get anything to give me the slow-down. I've restarted the site several times during my testing and have always seen the showdown in fairly short order.
I'd like to be able to work on the site while it's up, and I have no way to get non-blog pages on the site other than the umbraco admin dashboard/console, so that's a bit of a problem.
So, I brought the console back up. Still not seeing any problems, so not sure that's the issue unless it leaks.
I'll continue testing before I'm convinced one way or the other :P
2. Yes and yes. It'll be production for this site once I go live. I do compilation on my local machine but don't have a full umbraco setup on my workstation.
3. Downloading dottrace. 10 day clock is ticking ;)
Not sure dottrace is going to tell me a whole lot, but there are a few tidbits I found in it.
During site spin-up (from a restart of the site/worker process), it appears (if I'm reading this right) that 59.56% of the time (43 seconds total) is spent in System.Text.RegularExpressions.CompiledRegexRunner.FindFirstChar(). Where is it being called from? UrlRewriting.net.web.Regex.IsMatch.
Keep in mind that I removed the 500 rules from my urlrewriting config file. There are only like 3 rules in there.
59.56 % FindFirstChar - 43128 ms - 2320 calls - System.Text.RegularExpressions.CompiledRegexRunner.FindFirstChar() 59.56 % Scan - 43128 of 43128 ms - 2320 of 2320 calls - System.Text.RegularExpressions.RegexRunner.Scan(Regex, String, Int32, Int32, Int32, Int32, Boolean) 59.56 % Run - 43128 of 43128 ms - 2320 of 2320 calls - System.Text.RegularExpressions.Regex.Run(Boolean, Int32, String, Int32, Int32, Int32) 59.56 % IsMatch - 43128 of 43128 ms - 2320 of 2320 calls - System.Text.RegularExpressions.Regex.IsMatch(String) 59.56 % IsRewrite - 43127 of 43128 ms - 2240 of 2320 calls - UrlRewritingNet.Web.RegExRewriteRule.IsRewrite(String) 59.56 % RedirectUrl - 43126 of 43128 ms - 2232 of 2320 calls - UrlRewritingNet.Web.UrlRewriteModule.RedirectUrl(HttpApplication) 59.56 % OnBeginRequest - 43126 of 43128 ms - 2232 of 2320 calls - UrlRewritingNet.Web.UrlRewriteModule.OnBeginRequest(Object, EventArgs) 59.56 % Execute - 43126 of 43128 ms - 2232 of 2320 calls - System.Web.HttpApplication.SyncEventExecutionStep.Execute() (from System.Web.HttpApplication.IExecutionStep) 59.56 % ExecuteStep - 43126 of 43128 ms - 2232 of 2320 calls - System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean &) 59.56 % ResumeSteps - 43126 of 43128 ms - 2232 of 2320 calls - System.Web.HttpApplication.PipelineStepManager.ResumeSteps(Exception) 59.56 % BeginProcessRequestNotification - 43126 of 43128 ms - 2232 of 2320 calls - System.Web.HttpApplication.BeginProcessRequestNotification(HttpContext, AsyncCallback) 59.56 % ProcessRequestNotificationPrivate - 43126 of 43128 ms - 2232 of 2320 calls - System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest, HttpContext) 59.56 % ProcessRequestNotificationHelper - 43126 of 43128 ms - 2232 of 2320 calls - System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32) 59.56 % ProcessRequestNotification - 43126 of 43128 ms - 2232 of 2320 calls - System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32) 59.56 % Thread #20088400 - 43125 of 43128 ms - 1116 of 2320 calls 0.00 % Thread #20089400 - 1 of 43128 ms - 1116 of 2320 calls 0.00 % RewriteUrl - 1 of 43128 ms - 8 of 2320 calls - UrlRewritingNet.Web.UrlRewriteModule.RewriteUrl(HttpApplication) 0.00 % ValidateKeyValuePair - 1 of 43128 ms - 40 of 2320 calls - System.Data.Common.DbConnectionOptions.ValidateKeyValuePair(String, String) 0.00 % set_Item - 1 of 43128 ms - 40 of 2320 calls - System.Data.Common.DbConnectionStringBuilder.set_Item(String, Object) 0.00 % set_ConnectionString - 1 of 43128 ms - 40 of 2320 calls - System.Data.Common.DbConnectionStringBuilder.set_ConnectionString(String) 0.00 % CreateSqlHelper - 1 of 43128 ms - 40 of 2320 calls - umbraco.DataLayer.DataLayerHelper.CreateSqlHelper(String) 0.00 % get_SqlHelper - 1 of 43128 ms - 8 of 2320 calls - umbraco.BusinessLogic.Application.get_SqlHelper() 0.00 % GetCommentsForPost - 0 of 43128 ms - 32 of 2320 calls - Umlaut.Umb.Blog.BlogLibrary.GetCommentsForPost(Int32) 0.00 % AppendKeyValuePairBuilder - 0 of 43128 ms - 40 of 2320 calls - System.Data.Common.DbConnectionOptions.AppendKeyValuePairBuilder(StringBuilder, String, String, Boolean) 0.00 % get_ConnectionString - 0 of 43128 ms - 40 of 2320 calls - System.Data.Common.DbConnectionStringBuilder.get_ConnectionString() 0.00 % CreateSqlHelper - 0 of 43128 ms - 40 of 2320 calls - umbraco.DataLayer.DataLayerHelper.CreateSqlHelper(String) 0.00 % get_SqlHelper - 0 of 43128 ms - 8 of 2320 calls - umbraco.BusinessLogic.Application.get_SqlHelper() 0.00 % GetCommentsForPost - 0 of 43128 ms - 32 of 2320 calls - Umlaut.Umb.Blog.BlogLibrary.GetCommentsForPost(Int32)
The next highest % usage after the 59.56% is 1.95%
Maybe it needs to be called that much during startup, or maybe the code just isn't well-behaved. Either way, the 43 seconds it takes certainly explains the long startup of the site.
There are around 550 pages on the site. Not sure why RedirectUrl is called 2,232 times during startup.
Still trying to make sense of individual request delays, but wanted to post this.
Now, on to page requests. So I surfed the site a little and got a couple slow requests. This snapshot was started after the site was up, and was ended just after I managed to get a couple slow requests.
The only hotspot shown here is WaitAny, which I assume is worker process idle time, and nothing useful to us. I'm posting it here in case my assumption is incorrect.
89.76 % WaitAny - 88714 ms - 3 calls - System.Threading.WaitHandle.WaitAny(WaitHandle [], Int32, Boolean) 89.76 % ThreadProc - 88714 of 88714 ms - 3 of 3 calls - System.Net.TimerThread.ThreadProc() 89.76 % ThreadStart_Context - 88714 of 88714 ms - 3 of 3 calls - System.Threading.ThreadHelper.ThreadStart_Context(Object) 89.76 % Run - 88714 of 88714 ms - 3 of 3 calls - System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object) 89.76 % ThreadStart - 88714 of 88714 ms - 3 of 3 calls - System.Threading.ThreadHelper.ThreadStart() 89.76 % Thread #20096400 - 88714 of 88714 ms - 3 of 3 calls
The next highest hotspot after that is only 0.59% of the CPU time total - nothing remotely significant. That's the call to SQL Server.
0.59 % SNIReadSync* - 580 ms - 362 calls - .SNIReadSync*(SNI_Conn *, SNI_Packet * *, Int32) 0.59 % SNIReadSync - 580 of 580 ms - 362 of 362 calls - SNINativeMethodWrapper.SNIReadSync(SafeHandle, IntPtr &, Int32) 0.59 % ReadSni - 580 of 580 ms - 362 of 362 calls - System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult, TdsParserStateObject) 0.59 % ReadNetworkPacket - 580 of 580 ms - 362 of 362 calls - System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket() 0.57 % ReadBuffer - 564 of 580 ms - 331 of 362 calls - System.Data.SqlClient.TdsParserStateObject.ReadBuffer() 0.02 % ConsumePreLoginHandshake - 16 of 580 ms - 31 of 362 calls - System.Data.SqlClient.TdsParser.ConsumePreLoginHandshake(Boolean, Boolean, Boolean &)
Next after that is XmlILCommand.Execute. Again, insignificant cpu usage.
0.38 % Execute - 378 ms - 122 calls - System.Xml.Xsl.XmlILCommand.Execute(Object, XmlResolver, XsltArgumentList, XmlSequenceWriter) 0.38 % Execute - 378 of 378 ms - 122 of 122 calls - System.Xml.Xsl.XmlILCommand.Execute(Object, XmlResolver, XsltArgumentList, XmlWriter, Boolean) 0.38 % Execute - 378 of 378 ms - 122 of 122 calls - System.Xml.Xsl.XmlILCommand.Execute(IXPathNavigable, XmlResolver, XsltArgumentList, XmlWriter) 0.38 % Execute - 378 of 378 ms - 122 of 122 calls - System.Xml.Xsl.XmlILCommand.Execute(IXPathNavigable, XmlResolver, XsltArgumentList, TextWriter) 0.38 % Transform - 378 of 378 ms - 122 of 122 calls - System.Xml.Xsl.XslCompiledTransform.Transform(IXPathNavigable, XsltArgumentList, TextWriter) 0.38 % GetXsltTransformResult - 378 of 378 ms - 122 of 122 calls - umbraco.macro.GetXsltTransformResult(XmlDocument, XslCompiledTransform, Dictionary<String, Object>) 0.37 % XsltTransform - 369 of 378 ms - 35 of 122 calls - umbraco.presentation.templateControls.ItemRenderer.XsltTransform(String, String, Boolean) 0.01 % loadMacroXSLT - 9 of 378 ms - 87 of 122 calls - umbraco.macro.loadMacroXSLT(macro, Hashtable, Hashtable)
If I look at the 100% "all threads" view, I seee some timer calls that take up less than 1% of time. Looks like that is the call to keep alive.
So, the startup information from the previous post was interesting, but the information here doesn't really offer any pointers, assuming I correctly understand the role of WaitAny.
So I went and just checked on the url rewriting. I renamed the wrong file (d'oh!) so I still had 500+ entries in the config. NOt 2500, but 500, so at least some of the urlrwriter activity is justified.
With url rewriting trimmed down, site startup time is significantly improved.
I think I'm going to have to put the rewriter on my old sites and leave them up (but empty except for rewriting) rather than point all the domains to umbraco and let the urlrewriter sort it out.
With the rewriter handling just 3 rules, here's what startup looks like. Assembly loading time is 14%, Beyond that, umbraco code to load types is anotehr 12% and getting xml content is also aboutg 12%.
13.94 % nLoad - 532.6 ms - 113 calls - System.Reflection.Assembly.nLoad(AssemblyName, String, Evidence, Assembly, StackCrawlMark &, Boolean, Boolean) 13.94 % InternalLoad - 532.6 of 532.6 ms - 113 of 113 calls - System.Reflection.Assembly.InternalLoad(AssemblyName, Evidence, StackCrawlMark &, Boolean) 13.37 % InternalLoad - 510.7 of 532.6 ms - 69 of 113 calls - System.Reflection.Assembly.InternalLoad(String, Evidence, StackCrawlMark &, Boolean) 0.49 % Load - 18.7 of 532.6 ms - 36 of 113 calls - System.Reflection.Assembly.Load(AssemblyName) 0.08 % InternalLoadFrom - 3.2 of 532.6 ms - 8 of 113 calls - System.Reflection.Assembly.InternalLoadFrom(String, Evidence, Byte [], AssemblyHashAlgorithm, Boolean, StackCrawlMark &) 11.77 % GetExportedTypes - 449.8 ms - 33 calls - System.Reflection.Assembly.GetExportedTypes() 11.77 % FindClassesOfType - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.BusinessLogic.Utils.TypeFinder.FindClassesOfType(Type) 11.77 % FindClassesOfType<T> - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.BusinessLogic.Utils.TypeFinder.FindClassesOfType<T>(Boolean) 11.77 % RegisterIApplications - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.BusinessLogic.Application.RegisterIApplications() 11.77 % Application..cctor - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.BusinessLogic.Application..cctor() 11.77 % setupUser - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.BusinessLogic.User.setupUser(Int32) 11.77 % User..ctor - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.BusinessLogic.User..ctor(Int32) 11.77 % GetUser - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.BusinessLogic.User.GetUser(Int32) 11.77 % InitializeApplication - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.presentation.requestModule.InitializeApplication(HttpApplication) 11.77 % Application_BeginRequest - 449.8 of 449.8 ms - 33 of 33 calls - umbraco.presentation.requestModule.Application_BeginRequest(Object, EventArgs) 11.77 % Execute - 449.8 of 449.8 ms - 33 of 33 calls - System.Web.HttpApplication.SyncEventExecutionStep.Execute() (from System.Web.HttpApplication.IExecutionStep) 11.77 % ExecuteStep - 449.8 of 449.8 ms - 33 of 33 calls - System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean &) 11.77 % ResumeSteps - 449.8 of 449.8 ms - 33 of 33 calls - System.Web.HttpApplication.PipelineStepManager.ResumeSteps(Exception) 11.77 % BeginProcessRequestNotification - 449.8 of 449.8 ms - 33 of 33 calls - System.Web.HttpApplication.BeginProcessRequestNotification(HttpContext, AsyncCallback) 11.77 % ProcessRequestNotificationPrivate - 449.8 of 449.8 ms - 33 of 33 calls - System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest, HttpContext) 11.77 % ProcessRequestNotificationHelper - 449.8 of 449.8 ms - 33 of 33 calls - System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32) 11.77 % ProcessRequestNotification - 449.8 of 449.8 ms - 33 of 33 calls - System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32) 11.77 % Thread #20089400 - 449.8 of 449.8 ms - 33 of 33 calls 11.72 % get_XmlContentInternal - 447.7 ms - 4 calls - umbraco.content.get_XmlContentInternal() 11.71 % get_XmlContent - 447.2 of 447.7 ms - 3 of 4 calls - umbraco.content.get_XmlContent() 11.71 % Page_PreInit - 447.2 of 447.7 ms - 3 of 4 calls - umbraco.UmbracoDefault.Page_PreInit(Object, EventArgs) 11.71 % EventArgFunctionCaller - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, Object, Object, EventArgs) 11.71 % Callback - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.Util.CalliEventHandlerDelegateProxy.Callback(Object, EventArgs) 11.71 % OnPreInit - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.UI.Page.OnPreInit(EventArgs) 11.71 % PerformPreInit - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.UI.Page.PerformPreInit() 11.71 % ProcessRequestMain - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean) 11.71 % ProcessRequest - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.UI.Page.ProcessRequest(Boolean, Boolean) 11.71 % ProcessRequest - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.UI.Page.ProcessRequest() 11.71 % ProcessRequestWithNoAssert - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.UI.Page.ProcessRequestWithNoAssert(HttpContext) 11.71 % ProcessRequest - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.UI.Page.ProcessRequest(HttpContext) 11.71 % ProcessRequest - 447.2 of 447.7 ms - 3 of 4 calls - ASP.default_aspx.ProcessRequest(HttpContext) 11.71 % Execute - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.HttpApplication.CallHandlerExecutionStep.Execute() (from System.Web.HttpApplication.IExecutionStep) 11.71 % ExecuteStep - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean &) 11.71 % ResumeSteps - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.HttpApplication.PipelineStepManager.ResumeSteps(Exception) 11.71 % BeginProcessRequestNotification - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.HttpApplication.BeginProcessRequestNotification(HttpContext, AsyncCallback) 11.71 % ProcessRequestNotificationPrivate - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest, HttpContext) 11.71 % ProcessRequestNotificationHelper - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32) 11.71 % ProcessRequestNotification - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32) 11.71 % MgdIndicateCompletion* - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion*(IntPtr, RequestNotificationStatus &) 11.71 % ProcessRequestNotificationHelper - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr, IntPtr, IntPtr, Int32) 11.71 % ProcessRequestNotification - 447.2 of 447.7 ms - 3 of 4 calls - System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32) 11.71 % Thread #20089400 - 447.2 of 447.7 ms - 2 of 4 calls 0.00 % Thread #20088400 - 0.0 of 447.7 ms - 1 of 4 call 0.01 % cctor>b__0 - 0.5 of 447.7 ms - 1 of 4 call - umbraco.content.cctor>b__0(Object) (from <)
Regular request profiling still looks about the same
Your last post could be helpfull. I've looked in the code where WaitAny was being used. That is only in 1 place, and by default, 15 seconds is involved.
The place where it is used is the part for load-balancing.
Can you check your config/umbracoSettings.config file and look for distributedCall
Is it set to true in your case? If so, please set to false (you're only having 1 server right?) Also, under the servers-part, is there a server listed? If so, comment that one out. (I base this on the default umbracoSettings.config file)
Another way to go with the redirects of old pages is to create an INotFoundHandler that gets triggered if the incoming url isn't matched to anything by umbraco. Then you could do the redirect yourself, and the code will only get run when it's needed. That's the way I solved it on my own blog.
distributedCall is set to false and no servers are listed:
<!-- distributed calls make umbraco use webservices to handle cache refreshing --> <distributedCall enable="false"> <!-- the id of the user who's making the calls --> <!-- needed for security, umbraco will automatically look up correct login and passwords --> <user>0</user> <servers> <!-- add ip number or hostname, make sure that it can be reached from all servers --> <!-- <server>127.0.0.1</server>--> </servers> </distributedCall>
Here's the full subtree for the waitany call. It doesn't show any umbraco code in the stack (I don't think the profiler picks up who started a thread), but I do see a .net timer in there, so that says to me this is probably umbraco code.
93.93 % WaitAny - 130577 ms - 4 calls - System.Threading.WaitHandle.WaitAny(WaitHandle [], Int32, Boolean) 93.93 % ThreadProc - 130577 of 130577 ms - 4 of 4 calls - System.Net.TimerThread.ThreadProc() 93.93 % ThreadStart_Context - 130577 of 130577 ms - 4 of 4 calls - System.Threading.ThreadHelper.ThreadStart_Context(Object) 93.93 % Run - 130577 of 130577 ms - 4 of 4 calls - System.Threading.ExecutionContext.Run(ExecutionContext, ContextCallback, Object) 93.93 % ThreadStart - 130577 of 130577 ms - 4 of 4 calls - System.Threading.ThreadHelper.ThreadStart() 93.93 % Thread #361549768 - 130577 of 130577 ms - 4 of 4 calls
Maybe not. *so far* I have had very few issues with url rewriting cut down to those 3 or so rules vs. the 500+. It may really be the root of the problem.
I'm crossing my fingers for the moment. I'll update this thread one way or the other this week.
I'm not saying that's definitely the issue, just don't want to leave you guys in the dark :)
So far, so good. I created a custom http module that had the redirects hard-coded (I spit out the switch statement as part of the conversion process) and the site is performing well.
Just so everyone knows, it appears that the UrlRewriting.net component doesn't do well with hundreds of rules. It's just not suited for what I needed to do (migration while keeping links active)
I've had problems with lots of rules on UrlRewritiung.Net as well in the past. I've found that ISAPI_REWRITE for IIS to be better for lots of rules (if you're using IIS6) or you can use the rewriting engine for IIS7 if you have that.
75-100 percent cpu utilization spikes with single site visitor
Third time is the charm. the % in the title made it into the post url, and browsers weren't allowing navigation to the post. Should probably fix that :)
Sorry if this is a dup. I waited 15 minutes for my other post to show up and it hasn't yet. Bummer. In any case, this post has a little more info.
I recently (yesterday) imported 500+ blog posts and 2000+ comments from community server. I'll have several hundred more pages to go before the site is fully stocked. I had planned to go live with the blog this week, but this perf issue is a showstopper.
When hitting a single page, I'm seeing CPU spikes from 50% to 100%. That's loading the server, and making individual pages take several seconds to load. For example, this page had a 4 second CPU spike of 100%.
http://10rem.net/2010/02/22/windows-client-developer-roundup-for-2222010
Once initially loaded, it's good for several minutes (presumably whatever the cache time is)
Most of my customization has been limited to skinning/templates. I've modified some xslt, but mainly just to change styles and layout. xml caching is turned on in the config. I have 500+ entries in the urlrewriting config file to map the old community server urls to the new umbraco ones.
I'm not running the image resizer (saw in other posts that it had a memory leak), and I'm not seeing anything that makes sense in the debug trace. Memory usage is pretty consistent at around 350mb. I see no errors in the windows event log.
The server (older dual proc xeon 2.4ghz) runs my blog under community server with CPU usage typically in the < 3% range, so the machine is capable.
Not sure if this is the problem, or a red herring, but I see a bunch of these in the umbraco log table:
Error removing node from umbraco index: 'System.IO.IOException: No sub-file with id _4.f8 found at Lucene.Net.Index.CompoundFileReader.OpenInput(String id) at Lucene.Net.Index.SegmentReader.OpenNorms(Directory cfsDir) at Lucene.Net.Index.SegmentReader.Initialize(SegmentInfo si) at Lucene.Net.Index.SegmentReader.Get(Directory dir, SegmentInfo si, SegmentInfos sis, Boolean closeDir, Boolean ownDir) at Lucene.Net.Index.IndexReader.AnonymousClassWith.DoBody() at Lucene.Net.Store.Lock.With.Run() at Lucene.Net.Index.IndexReader.Open(Directory directory, Boolean closeDirectory) at Lucene.Net.Index.IndexReader.Open(String path) at umbraco.cms.businesslogic.index.Indexer.RemoveNode(Int32 Id)'
I'd like to rebuild the index, but can't tell where the files are that I'd have to delete.
Any suggestions how to debug the larger perf issue?
site: http://10rem.net
Thanks.
Pete
Hi Pete,
to rebuild the index, delete all files in the /data/_systemUmbracoIndexDontDelete folder.
After that, call /umbraco/reindex.aspx
That should start the reindexing. There have been others reporting issues with lucene as well, perhaphs there's some additional info on the forum.
HTH,
Peter
Thanks Peter
I'm reindexing that now. Despite hitting "refresh", I'm not seeing any actual documents getting indexed (unless it really takes forever to calculate the number of documents), so maybe I have a problem with Lucene.
0 of Not calculated yet documents indexed. Currently indexing ''
I do see files getting added and removed in that index folder, though, so maybe it's just the page having issues.
As an aside, that reindexing page seems like a bad page to have laying open. Is that using umbraco security or is it just open to the public?
Pete
Hi Pete, the indexing page requires the user to be logged in to the umbraco backend (maybe even as admin - I'm not quite sure) - if the user is not logged in, he is redirected to the login page.
>Tommy
Thanks Tommy
Pete
Ok, looks like the problem is the Related Posts xslt that comes with Blog4Umbraco. It is enumerating a huge number of blog posts, first getting all that have any of the same tags (using tagslib:getContentWithTags) and then sorting them by most tag matches and then createdate.
BlogpostRelatedPages.xslt
http://blog4umbraco.codeplex.com/sourcecontrol/changeset/view/39717?projectName=blog4umbraco#692927
Does anyone have a more performant one, or suggestions for improving it? I was actually looking forward to finally having that on my blog, but not at this price :)
Pete
Yikes - that stinks. Will look at it tomorrow!
I think I actually did that code. But I did not optimize it for that large an amount of posts. It is probably the weighting part that is hurting it most. Sorry about that :)
Maybe it would be better to do the entire weighting in one clean sql query?
@Morten and @Niels
No problem guys. If you come up with a solution, let me know. I'll be happy to test it out.
My blog covers a number of topics within a few discrete areas, so I tend to have lots of tags, and lots of posts with the same tag.
Thanks
Pete
Ok, so the problem isn't with any one xslt, page or template. I may misdiagnosed earlier with the related posts one, as it looked like perf got better when I removed it and worse when I added it. (sorry if perf is ok in that. my mistake)
The performance issues are random and pretty universal. I'll hit a dog simple page, and it'll take 5-20 seconds to load up, while my server CPU is spiking. Complex pages, same thing. Sometimes they're really quick, sometimes slow. Sometimes the server chokes and I have to restart the app pool.
This is not right after restarting the app - it's far far worse then. The site is unavailable for almost a minute while umbraco starts up, and then individual pages are slow to come up.
I've rebuilt the indexes. I've also looked at the umbraco.config xml disk cache, and it seems ok.
Umbraco is running in its own dedicated app pool. Permissions appear to be correct (all the right files get updated)
I currently have all cache settings in the xslts set to 0 (not sure if that means to cache forever or not at all), but I want to get raw perf addressed before I start tweaking caching.
Suggestions on what to look for here? I've seen several other perf posts in the forums here that sound like mine, but they tend to just stay unresolved.
Pete
@Morton If it is related to the weighting of the tags: Last May I have made an effort to add this functionality to the Tag-Datatype. I've uploaded it as a patch to codeplex and wrote a blogpost about it: http://blog.thoehler.com/2009/05/14/new-package-blogcumulus-v10-published
hth, Thomas
Pete, if you request the pages with the ?umbdebugshowtrace=true does that give you any useful data on what seems to be taking a lot of time?
Here's a page that has a label (http://10rem.net/wallpaper)
it took several seconds to render, but it's only one second. The home page took several seconds to render in the browser, but I checked the trace, and end render happened pretty quickly
This one took about 15 seconds to render http://10rem.net/blog/?umbdebugshowtrace=true , but the debug trace shows it should be much faster (whole page completed in 0.25 seconds according to the trace)
...
This one took 17 seconds to render, and showed spikes on the server lasting several seconds:
So, it seems like it is something outside the pipeline? However, the other sites on the same box are really fast.
I'm going to keep evaluating and see if anything sticks out. It's very random and hard to track.
For grins, I took the 500+ url rewrite rules out of the url rewriting config to see if that's it. I bounced the site (added a blank line to web.config). Initial startup was slow, but it seems like the pages are all coming back quickly. I don't want to assume that's the problem (I keep chasing the wrong things here) but looks promising.
If the url rewriter is the problem, I'll need to find another way to redirect all my old blog posts and site pages (eventually will have several hundred more rules to add) to the new urls in umbraco. Maybe keep those domains up and put the rewriting there instead.
Ack, nope, just had another random slow page. http://10rem.net/lab?umbdebugshowtrace=true . Took about 18 seconds, showed CPU spikes, but page trace shows nothing:
Ideas? Umbraco Log shows no issues. Windows app event log shows no problems.
Pete
Also, FWIW, my worker process for the umbraco site (even though recently bounced and site isn't getting any traffic except my tests and maybe you guys), total cpu time is already at 0:30:29 whereas my blog, which I haven't bounced in ages is at 0:04:54. Both are on the same server (different app pools), both use the same back-end database server. It's also using over 2x the memory (264,832k vs. community server at 106,424k), but I'm less concerned about that.
So the server, iis, asp.net, disk, memory etc. are all working fine.
Pete
Does your site run in it's own application-pool? If not, try putting it in its own application pool.
Do you also have access to the event-logs of windows? Could be that there's some errror there which causes the pool to recycle (thus restarting the app, thus a slow initial request)
I noticed 18 seconds on the lab-page as well, only on the first request however. I could not notice anything strange in the trace.
You could also try to have a look at the umbracoLog-table. See if there's any errors in there.
Let us know what you find.
Peter
@Peter
We may have crossed messages :)
So I'm working from no real info. Thanks for checking out the page. I appreciate the help.
The server is on the other side of a wall here, and I have full remote access, so I can try pretty much anything.
Pete
Ok, lets try a different approach then :)
You say you use a named account. Do you mean an account from Active Directory? If so, try to change it to Network Service. That's a local account and that way you rule out any network-hiccups to the domain controller.
Do you use a SQL-login to connect to the DB or an active directory one? Do you have slow responses in the backend as well or does it only occur at the front-end?
Do you use a lot of media-items? (From what I've seen you don't, but perhaphs you have some GetMedia things in a few macros?)
Peter
I have a couple media items (the book image on the right on the blog and about pages, the two images on the lab page), but nothing else. My old blog posts were all imported with their images left external. New blog posts will have images in the media library.
I use a SQL login to connect to the database server.
Database server peaks at about 7% utilization, usually around 1-2%
I temporarily changed the account to local system to try out perf. Some pages were fast, but then I came across the strange slowdown again. About 15-20 seconds to load one of my blog pages, and cpu spiked.
Here's what a spike looks like:
http://www.irritatedvowel.com/pub/umbraco_site_cpu_spike.jpg
Sometimes they plateau at 100% for 4+ seconds.
Very unpredictable.
Pete
Just changed back to the named account and hit a page on the site. DB server barely sees any traffic (stayed around 1% the entire time), so that's not the bottleneck. I'd have expected to see it in the trace if it were, but you never know.
Pete
Hmm. I've had the umbraco admin console up in the background this whole time. I wasn't doing anything in it during the tests. Does it auto-refresh or anything crazy like that?
I shut down the console and forced a site restart to get cache out of the picture. Once it was back up from the bounce (really wish that didn't take so long) site is screaming. I can't get anything to give me the slow-down. I've restarted the site several times during my testing and have always seen the showdown in fairly short order.
I'd like to be able to work on the site while it's up, and I have no way to get non-blog pages on the site other than the umbraco admin dashboard/console, so that's a bit of a problem.
So, I brought the console back up. Still not seeing any problems, so not sure that's the issue unless it leaks.
I'll continue testing before I'm convinced one way or the other :P
Pete
Nope. That wasn't it either.
ObservableCollection<Straw>.Grasp()
Pete
Should I try 4.1 beta instead? Not sure about stability of that, but I understand from here that it has some perf improvements.
Pete
Hi Pete! I have a few questions:
1) What is the size of the Xml cache file /data/umbraco.config after population?
2) Do you see the same issue in your dev environment, or is this machine your dev environment?
3) If "yes" to either part of 2, can you attach a profiler such as dotTrace to dig deeper? :)
I wouldn't recommend 4.1 beta2 if you want to go live, it could be tricky to downgrade if you run into problems
Cheers
Alex
@Alex
1: 2.68 MB (2,812,281 bytes)
2. Yes and yes. It'll be production for this site once I go live. I do compilation on my local machine but don't have a full umbraco setup on my workstation.
3. Downloading dottrace. 10 day clock is ticking ;)
Pete
Not sure dottrace is going to tell me a whole lot, but there are a few tidbits I found in it.
During site spin-up (from a restart of the site/worker process), it appears (if I'm reading this right) that 59.56% of the time (43 seconds total) is spent in System.Text.RegularExpressions.CompiledRegexRunner.FindFirstChar(). Where is it being called from? UrlRewriting.net.web.Regex.IsMatch.
Keep in mind that I removed the 500 rules from my urlrewriting config file. There are only like 3 rules in there.
The next highest % usage after the 59.56% is 1.95%
Maybe it needs to be called that much during startup, or maybe the code just isn't well-behaved. Either way, the 43 seconds it takes certainly explains the long startup of the site.
There are around 550 pages on the site. Not sure why RedirectUrl is called 2,232 times during startup.
Still trying to make sense of individual request delays, but wanted to post this.
Pete
Now, on to page requests. So I surfed the site a little and got a couple slow requests. This snapshot was started after the site was up, and was ended just after I managed to get a couple slow requests.
The only hotspot shown here is WaitAny, which I assume is worker process idle time, and nothing useful to us. I'm posting it here in case my assumption is incorrect.
The next highest hotspot after that is only 0.59% of the CPU time total - nothing remotely significant. That's the call to SQL Server.
Next after that is XmlILCommand.Execute. Again, insignificant cpu usage.
If I look at the 100% "all threads" view, I seee some timer calls that take up less than 1% of time. Looks like that is the call to keep alive.
So, the startup information from the previous post was interesting, but the information here doesn't really offer any pointers, assuming I correctly understand the role of WaitAny.
Pete
So I went and just checked on the url rewriting. I renamed the wrong file (d'oh!) so I still had 500+ entries in the config. NOt 2500, but 500, so at least some of the urlrwriter activity is justified.
With url rewriting trimmed down, site startup time is significantly improved.
I think I'm going to have to put the rewriter on my old sites and leave them up (but empty except for rewriting) rather than point all the domains to umbraco and let the urlrewriter sort it out.
With the rewriter handling just 3 rules, here's what startup looks like. Assembly loading time is 14%, Beyond that, umbraco code to load types is anotehr 12% and getting xml content is also aboutg 12%.
Regular request profiling still looks about the same
Again, nothing there that explains the multi-second requests.
Pete
Pete,
Kudo's for still hanging in there :)
Your last post could be helpfull. I've looked in the code where WaitAny was being used. That is only in 1 place, and by default, 15 seconds is involved.
The place where it is used is the part for load-balancing.
Can you check your config/umbracoSettings.config file and look for distributedCall
Is it set to true in your case? If so, please set to false (you're only having 1 server right?) Also, under the servers-part, is there a server listed? If so, comment that one out. (I base this on the default umbracoSettings.config file)
Does this help?
Regards,
Peter
Another way to go with the redirects of old pages is to create an INotFoundHandler that gets triggered if the incoming url isn't matched to anything by umbraco. Then you could do the redirect yourself, and the code will only get run when it's needed. That's the way I solved it on my own blog.
@Peter
distributedCall is set to false and no servers are listed:
Here's the full subtree for the waitany call. It doesn't show any umbraco code in the stack (I don't think the profiler picks up who started a thread), but I do see a .net timer in there, so that says to me this is probably umbraco code.
@Morten, good idea on the INotFoundHandler
Pete
Bah, yet another idea flushed down the toilet. I'm running out of ideas here....
Perhaphs someone from HQ has a clue here.
Maybe not. *so far* I have had very few issues with url rewriting cut down to those 3 or so rules vs. the 500+. It may really be the root of the problem.
I'm crossing my fingers for the moment. I'll update this thread one way or the other this week.
I'm not saying that's definitely the issue, just don't want to leave you guys in the dark :)
Pete
So far, so good. I created a custom http module that had the redirects hard-coded (I spit out the switch statement as part of the conversion process) and the site is performing well.
Just so everyone knows, it appears that the UrlRewriting.net component doesn't do well with hundreds of rules. It's just not suited for what I needed to do (migration while keeping links active)
Pete
I've had problems with lots of rules on UrlRewritiung.Net as well in the past. I've found that ISAPI_REWRITE for IIS to be better for lots of rules (if you're using IIS6) or you can use the rewriting engine for IIS7 if you have that.
I just stumpled into this problem myself.
Had about 500 rules in UrlRewriting.config, and CPU usage was at 100% almost all of the time!
I then commented out about 450 of the rules, and now CPU usage is at the normal 15%.
So I definately second that it looks like UrlRewriting can't handle a large number of rules.
Will have a look at ISAPI_REWRITE. I've used it in the past and been pretty happy with it.
/SoerenS
Another option could be looking at implementing a 404 handler. Might be more efficient?
I also had this problem and switched to the IIS7 URL Rewrite Module 2.0, where the rules are stored in web.config. Problem solved, at least so far.
is working on a reply...