Site suddenly starts exceed pool limits and fail to boot
Hi there
We have a site that's been running fine for months, but suddenly on the 19th of october and ever since, the site has been down multiple times a day.
The pattern is the same every time when looking in the log:
Multiple entries mentioning System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
System fails to enter log: System.TimeoutException: Failed to enter the lock within timeout.
After a retry the site will get back up by itself, but is down in the meantime. There's nothing prior to this in the log explaining it.
The site is running on a VM in Azure, with a seperate Azure VM as SQL server. It is a high demand site, but the cpu and memory usage on the server is at 20% at most.
The Umbraco version is at 8.6.8.
Any quailified guesses why this is happening, and why it started happening all of a sudden?
Full logs:
Timeout (there are many variations of this with different strack trace):
System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.Open()
at NPoco.Database.OpenSharedConnectionImp(Boolean isInternal)
at NPoco.Database.BeginTransaction(IsolationLevel isolationLevel)
at Umbraco.Core.Scoping.Scope.get_Database() in D:\a\1\s\src\Umbraco.Core\Scoping\Scope.cs:line 242
at Umbraco.Core.Persistence.Repositories.Implement.RedirectUrlRepository.GetMostRecentUrl(String url) in D:\a\1\s\src\Umbraco.Core\Persistence\Repositories\Implement\RedirectUrlRepository.cs:line 162
at Umbraco.Core.Services.Implement.RedirectUrlService.GetMostRecentRedirectUrl(String url) in D:\a\1\s\src\Umbraco.Core\Services\Implement\RedirectUrlService.cs:line 76
at Umbraco.Web.Routing.ContentFinderByRedirectUrl.TryFindContent(PublishedRequest frequest) in D:\a\1\s\src\Umbraco.Web\Routing\ContentFinderByRedirectUrl.cs:line 39
at System.Linq.Enumerable.Any[TSource](IEnumerable`1 source, Func`2 predicate)
at Umbraco.Web.Routing.PublishedRouter.FindPublishedContent(PublishedRequest request) in D:\a\1\s\src\Umbraco.Web\Routing\PublishedRouter.cs:line 424
at Umbraco.Web.Routing.PublishedRouter.FindPublishedContentAndTemplate(PublishedRequest request) in D:\a\1\s\src\Umbraco.Web\Routing\PublishedRouter.cs:line 386
at Umbraco.Web.Routing.PublishedRouter.PrepareRequest(PublishedRequest request) in D:\a\1\s\src\Umbraco.Web\Routing\PublishedRouter.cs:line 135
at Umbraco.Web.UmbracoInjectedModule.ProcessRequest(HttpContextBase httpContext) in D:\a\1\s\src\Umbraco.Web\UmbracoInjectedModule.cs:line 138
at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)`
Failed to enter lock:
System.TimeoutException: Failed to enter the lock within timeout.
at Umbraco.Core.SystemLock.Lock(Int32 millisecondsTimeout) in D:\a\1\s\src\Umbraco.Core\SystemLock.cs:line 85
at Umbraco.Core.Runtime.MainDomSemaphoreLock.AcquireLockAsync(Int32 millisecondsTimeout) in D:\a\1\s\src\Umbraco.Core\Runtime\MainDomSemaphoreLock.cs:line 41
Boot failed:
Umbraco.Core.Exceptions.BootFailedException: Boot failed. ---> System.TimeoutException: Cannot acquire MainDom
at Umbraco.Core.Runtime.MainDom.Acquire() in D:\a\1\s\src\Umbraco.Core\Runtime\MainDom.cs:line 172
at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& target, Boolean& initialized, Object& syncLock, Func`1 valueFactory)
at Umbraco.Core.Runtime.MainDom.get_IsMainDom() in D:\a\1\s\src\Umbraco.Core\Runtime\MainDom.cs:line 204
at Umbraco.Core.Runtime.CoreRuntime.AcquireMainDom(IMainDom mainDom) in D:\a\1\s\src\Umbraco.Core\Runtime\CoreRuntime.cs:line 263
at Umbraco.Core.Runtime.CoreRuntime.Boot(IRegister register, DisposableTimer timer) in D:\a\1\s\src\Umbraco.Core\Runtime\CoreRuntime.cs:line 169
--- End of inner exception stack trace ---
This all seems related to the number of connections available in your SQL Server instance, when Umbraco starts up it uses a SQL query to obtain a 'MainLock', if SQL Server has no connections available, Umbraco cannot start.
When you try again presumably one is free and it can start.
Similarly the other error is when a 404 is about to happen, Umbraco's content finder for redirects will lookup a SQL database table to see if a redirect exists for the Url before returning a 404.
You can run the following SQL on the SQL Server instance, when the issue occurs to see how many connections are in place, and see if over time this increases and never decreases:
SELECT
DB_NAME(dbid) as DBName,
COUNT(dbid) as NumberOfConnections,
loginame as LoginName
FROM
sys.sysprocesses
WHERE
dbid > 0
GROUP BY
dbid, loginame
Essentially if this number keeps increasing, some code somwhere in the solution, or another solution that is using the database, is grabbing a connection to the db and not releasing it properly, meaning another connection in the pool cannot immediately be made until the blocking connection timesout...
So do you have any custom code that accesses the database?
In theory it could be a bug in Umbraco, but you'd expect there to be alot more issues like this reported if that were the case?
From what we've seen, the number of connections is never above 10. Afaik the maximum available connections is 32,767.
We do have custom code on the solution, and all data is stored in a separate table than the Umbraco db. All of our custom code is logged, and nothing shows up in the log when the issue happens.
Since the original post, we've start seeing another error that may be related:
Umbraco.Core.Exceptions.BootFailedException: Boot failed: Umbraco cannot run. See Umbraco's log file for more details.\n\n-> Umbraco.Core.Exceptions.BootFailedException: Boot failed.\n\n-> System.IO.IOException: The process cannot access the file 'C:\\windows\\TEMP\\UmbracoData\\aad2ba757d1fd0e1d26ac57d2c96ad1d\\NuCache\\NuCache.Content.db' because it is being used by another process.\n at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)\r\n at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)\r\n at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)\r\n at CSharpTest.Net.IO.TransactedCompoundFile..ctor(Options options)\r\n at CSharpTest.Net.Storage.BTreeFileStoreV2..ctor(Options options)\r\n at CSharpTest.Net.Collections.BPlusTree`2.OptionsV2.CreateStorage()\r\n at CSharpTest.Net.Collections.BPlusTree`2.NodeCacheBase..ctor(BPlusTreeOptions`2 options)\r\n at CSharpTest.Net.Collections.BPlusTree`2.NodeCacheNone..ctor(BPlusTreeOptions`2 options)\r\n at CSharpTest.Net.Collections.BPlusTree`2..ctor(BPlusTreeOptions`2 ioptions)\r\n at Umbraco.Web.PublishedCache.NuCache.DataSource.BTree.GetTree(String filepath, Boolean exists) in D:\\a\\1\\s\\src\\Umbraco.Web\\PublishedCache\\NuCache\\DataSource\\BTree.cs:line 32\r\n at Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService.MainDomRegister() in D:\\a\\1\\s\\src\\Umbraco.Web\\PublishedCache\\NuCache\\PublishedSnapshotService.cs:line 173\r\n at Umbraco.Core.Runtime.MainDom.Register(Action install, Action release, Int32 weight) in D:\\a\\1\\s\\src\\Umbraco.Core\\Runtime\\MainDom.cs:line 90\r\n at Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService..ctor(PublishedSnapshotServiceOptions options, IMainDom mainDom, IRuntimeState runtime, ServiceContext serviceContext, IPublishedContentTypeFactory publishedContentTypeFactory, IdkMap idkMap, IPublishedSnapshotAccessor publishedSnapshotAccessor, IVariationContextAccessor variationContextAccessor, IProfilingLogger logger, IScopeProvider scopeProvider, IDocumentRepository documentRepository, IMediaRepository mediaRepository, IMemberRepository memberRepository, IDefaultCultureAccessor defaultCultureAccessor, IDataSource dataSource, IGlobalSettings globalSettings, IEntityXmlSerializer entitySerializer, IPublishedModelFactory publishedModelFactory, UrlSegmentProviderCollection urlSegmentProviders) in D:\\a\\1\\s\\src\\Umbraco.Web\\PublishedCache\\NuCache\\PublishedSnapshotService.cs:line 133\r\n at DynamicMethod(Object[] )\r\n at LightInject.PerContainerLifetime.GetInstance(Func`1 createInstance, Scope scope) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 6169\r\n at LightInject.ServiceContainer.EmitLifetime(ServiceRegistration serviceRegistration, Action`1 emitMethod, IEmitter emitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4656\r\n at LightInject.ServiceContainer.<>c__DisplayClass153_0.<CreateEmitMethodWrapper>b__0(IEmitter ms) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3856\r\n at LightInject.ServiceContainer.EmitConstructorDependency(IEmitter emitter, Dependency dependency) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4158\r\n at LightInject.ServiceContainer.EmitConstructorDependencies(ConstructionInfo constructionInfo, IEmitter emitter, Action`1 decoratorTargetEmitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4120\r\n at LightInject.ServiceContainer.EmitNewInstanceUsingImplementingType(IEmitter emitter, ConstructionInfo constructionInfo, Action`1 decoratorTargetEmitMethod) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4081\r\n at LightInject.ServiceContainer.EmitNewInstance(ServiceRegistration serviceRegistration, IEmitter emitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4038\r\n at LightInject.ServiceContainer.CreateDynamicMethodDelegate(Action`1 serviceEmitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3777\r\n at LightInject.ServiceContainer.<>c__DisplayClass198_0.<EmitLifetime>b__1() in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4657\r\n at LightInject.PerContainerLifetime.GetInstance(Func`1 createInstance, Scope scope) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 6169\r\n at LightInject.ServiceContainer.EmitLifetime(ServiceRegistration serviceRegistration, Action`1 emitMethod, IEmitter emitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4656\r\n at LightInject.ServiceContainer.<>c__DisplayClass153_0.<CreateEmitMethodWrapper>b__0(IEmitter ms) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3856\r\n at LightInject.ServiceContainer.EmitConstructorDependency(IEmitter emitter, Dependency dependency) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4158\r\n at LightInject.ServiceContainer.EmitConstructorDependencies(ConstructionInfo constructionInfo, IEmitter emitter, Action`1 decoratorTargetEmitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4120\r\n at LightInject.ServiceContainer.EmitNewInstanceUsingImplementingType(IEmitter emitter, ConstructionInfo constructionInfo, Action`1 decoratorTargetEmitMethod) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4081\r\n at LightInject.ServiceContainer.EmitNewInstance(ServiceRegistration serviceRegistration, IEmitter emitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4038\r\n at LightInject.ServiceContainer.CreateDynamicMethodDelegate(Action`1 serviceEmitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3777\r\n at LightInject.ServiceContainer.<>c__DisplayClass198_0.<EmitLifetime>b__1() in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4657\r\n at LightInject.PerContainerLifetime.GetInstance(Func`1 createInstance, Scope scope) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 6169\r\n at LightInject.ServiceContainer.EmitLifetime(ServiceRegistration serviceRegistration, Action`1 emitMethod, IEmitter emitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4656\r\n at LightInject.ServiceContainer.<>c__DisplayClass153_0.<CreateEmitMethodWrapper>b__0(IEmitter ms) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3856\r\n at LightInject.ServiceContainer.EmitConstructorDependency(IEmitter emitter, Dependency dependency) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4158\r\n at LightInject.ServiceContainer.EmitConstructorDependencies(ConstructionInfo constructionInfo, IEmitter emitter, Action`1 decoratorTargetEmitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4120\r\n at LightInject.ServiceContainer.EmitNewInstanceUsingImplementingType(IEmitter emitter, ConstructionInfo constructionInfo, Action`1 decoratorTargetEmitMethod) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4081\r\n at LightInject.ServiceContainer.EmitNewInstance(ServiceRegistration serviceRegistration, IEmitter emitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4038\r\n at LightInject.ServiceContainer.<>c__DisplayClass153_0.<CreateEmitMethodWrapper>b__0(IEmitter ms) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3856\r\n at LightInject.ServiceContainer.CreateDynamicMethodDelegate(Action`1 serviceEmitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3777\r\n at LightInject.ServiceContainer.CreateDelegate(Type serviceType, String serviceName, Boolean throwError) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4743\r\n at LightInject.ServiceContainer.CreateDefaultDelegate(Type serviceType, Boolean throwError) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4705\r\n at LightInject.ServiceContainer.GetInstance(Type serviceType) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3437\r\n at Umbraco.Core.Composing.ComponentCollectionBuilder.CreateItem(IFactory factory, Type itemType) in D:\\a\\1\\s\\src\\Umbraco.Core\\Composing\\ComponentCollectionBuilder.cs:line 33\r\n at System.Linq.Enumerable.WhereSelectArrayIterator`2.MoveNext()\r\n at System.Linq.Buffer`1..ctor(IEnumerable`1 source)\r\n at System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)\r\n at Umbraco.Core.Composing.ComponentCollectionBuilder.CreateItems(IFactory factory) in D:\\a\\1\\s\\src\\Umbraco.Core\\Composing\\ComponentCollectionBuilder.cs:line 25\r\n at Umbraco.Core.Composing.CollectionBuilderBase`3.CreateCollection(IFactory factory) in D:\\a\\1\\s\\src\\Umbraco.Core\\Composing\\CollectionBuilderBase.cs:line 117\r\n at LightInject.PerContainerLifetime.GetInstance(Func`1 createInstance, Scope scope) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 6169\r\n at LightInject.ServiceContainer.EmitLifetime(ServiceRegistration serviceRegistration, Action`1 emitMethod, IEmitter emitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4656\r\n at LightInject.ServiceContainer.<>c__DisplayClass153_0.<CreateEmitMethodWrapper>b__0(IEmitter ms) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3856\r\n at LightInject.ServiceContainer.CreateDynamicMethodDelegate(Action`1 serviceEmitter) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3777\r\n at LightInject.ServiceContainer.CreateDelegate(Type serviceType, String serviceName, Boolean throwError) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4743\r\n at LightInject.ServiceContainer.CreateDefaultDelegate(Type serviceType, Boolean throwError) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 4705\r\n at LightInject.ServiceContainer.GetInstance(Type serviceType) in C:\\projects\\lightinject\\src\\LightInject\\LightInject.cs:line 3437\r\n at Umbraco.Core.FactoryExtensions.GetInstance[T](IFactory factory) in D:\\a\\1\\s\\src\\Umbraco.Core\\FactoryExtensions.cs:line 22\r\n at Umbraco.Core.Runtime.CoreRuntime.Boot(IRegister register, DisposableTimer timer) in D:\\a\\1\\s\\src\\Umbraco.Core\\Runtime\\CoreRuntime.cs:line 187\r\n at Umbraco.Core.Exceptions.BootFailedException.Rethrow(BootFailedException bootFailedException) in D:\\a\\1\\s\\src\\Umbraco.Core\\Exceptions\\BootFailedException.cs:line 64\r\n at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()\r\n at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)\r\n at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
Something is running more instances than it should.
The solution has SeoChecker installed, and I'm seeing a large number of 404 requests with all are being logged by SeoChecker. Otherwise it could be some code running async that is not being handled properly.
I spent a lot of time in the server log looking at the events around the time of the crashes.
I think we did something to log every requested url along with the response code. That way we could look for urls that never got a response code, since the site crashed.
It was an absolute horror to find, since it was only this particular url that caused the error.
Site suddenly starts exceed pool limits and fail to boot
Hi there
We have a site that's been running fine for months, but suddenly on the 19th of october and ever since, the site has been down multiple times a day.
The pattern is the same every time when looking in the log:
System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
System.TimeoutException: Failed to enter the lock within timeout.
Umbraco.Core.Exceptions.BootFailedException: Boot failed. ---> System.TimeoutException: Cannot acquire MainDom
After a retry the site will get back up by itself, but is down in the meantime. There's nothing prior to this in the log explaining it.
The site is running on a VM in Azure, with a seperate Azure VM as SQL server. It is a high demand site, but the cpu and memory usage on the server is at 20% at most.
The Umbraco version is at 8.6.8.
Any quailified guesses why this is happening, and why it started happening all of a sudden?
Full logs:
Timeout (there are many variations of this with different strack trace):
Failed to enter lock:
Boot failed:
Hi OleP
This all seems related to the number of connections available in your SQL Server instance, when Umbraco starts up it uses a SQL query to obtain a 'MainLock', if SQL Server has no connections available, Umbraco cannot start.
When you try again presumably one is free and it can start.
Similarly the other error is when a 404 is about to happen, Umbraco's content finder for redirects will lookup a SQL database table to see if a redirect exists for the Url before returning a 404.
You can run the following SQL on the SQL Server instance, when the issue occurs to see how many connections are in place, and see if over time this increases and never decreases:
SELECT DB_NAME(dbid) as DBName, COUNT(dbid) as NumberOfConnections, loginame as LoginName FROM sys.sysprocesses WHERE dbid > 0 GROUP BY dbid, loginame
Essentially if this number keeps increasing, some code somwhere in the solution, or another solution that is using the database, is grabbing a connection to the db and not releasing it properly, meaning another connection in the pool cannot immediately be made until the blocking connection timesout...
So do you have any custom code that accesses the database?
In theory it could be a bug in Umbraco, but you'd expect there to be alot more issues like this reported if that were the case?
Anyway, hope that helps investigate more...
regards
marc
Hi Marc
Thanks for the reply.
From what we've seen, the number of connections is never above 10. Afaik the maximum available connections is 32,767.
We do have custom code on the solution, and all data is stored in a separate table than the Umbraco db. All of our custom code is logged, and nothing shows up in the log when the issue happens.
Since the original post, we've start seeing another error that may be related:
Something is running more instances than it should.
The solution has SeoChecker installed, and I'm seeing a large number of 404 requests with all are being logged by SeoChecker. Otherwise it could be some code running async that is not being handled properly.
Turned out to be some json serilization of NestedContent that created a stackoverflow.
Hi OleP,
I've been trying to track down this issue for quite some time. I'm also running an Auzre VM with custom tables etc.
Do you remember how you tracked down the json serialization issue and what you did to correct it?
Thanks
Hi David
I spent a lot of time in the server log looking at the events around the time of the crashes.
I think we did something to log every requested url along with the response code. That way we could look for urls that never got a response code, since the site crashed.
It was an absolute horror to find, since it was only this particular url that caused the error.
Good luck!
Thanks OleP,
I've resorted to the same sort of logging to figure it out too. Was hoping for a magic answer :)
Thanks for your help.
is working on a reply...