"Broken" application in a load-balanced environment after document type modification
We're using Umbraco 8.2.2 with a large number of nodes (tens or even hundreds of thousands) in a load-balanced environment, and operations such as creating, modifying, or deleting document types were always very slow, taking up to 15 minutes or more. This also spikes our CPU and Memory usage in our servers.
Our setup is on AWS, with:
Two EC2 instances: one Master for content editors and one Front for customers, both Windows servers.
An RDS for the Umbraco database.
An application load-balancer to manage requests.
Lately, after modifying or creating a new document type, there is a temporary slowness, and while the Master server functions as expected, the Front server seems to be malfunctioned and continuously "restarts" itself, and basically the application never loads there.
Restarting IIS, servers, and the database doesn't this problem
Our logs show:
Boot failures with a timeout exception related to acquiring a lock.
2023-08-21 05:27:50,652 [P6360/D10/T10] ERR Umbraco.Core.Runtime.CoreRuntime - Boot failed. (90193ms) [Timing 0596be1]
Umbraco.Core.Exceptions.BootFailedException: Boot failed. ---> System.TimeoutException: Failed to enter the lock within timeout.
at Umbraco.Core.AsyncLock.Lock(Int32 millisecondsTimeout) in D:\a\1\s\src\Umbraco.Core\AsyncLock.cs:line 114
at Umbraco.Core.MainDom.Acquire() in D:\a\1\s\src\Umbraco.Core\MainDom.cs:line 170
at Umbraco.Core.Runtime.CoreRuntime.AcquireMainDom(MainDom mainDom) in D:\a\1\s\src\Umbraco.Core\Runtime\CoreRuntime.cs:line 232
at Umbraco.Core.Runtime.CoreRuntime.Boot(IRegister register, DisposableTimer timer) in D:\a\1\s\src\Umbraco.Core\Runtime\CoreRuntime.cs:line 146
Warnings about the wait lock timing out and application shutdown.
2023-08-21 05:29:58,630 [P6360/D11/T24] WRN Umbraco.Core.Sync.DatabaseServerMessenger - The wait lock timed out, application is shutting down. The current instruction batch will be re-processed.
A single warning about loading content from local database.
2023-09-26 05:45:15,613 [P2368/D2/T1] WRN Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService - Loading content from local db raised warnings, will reload from database.
AWS Performance Insights indicates that the most demanding SQL operations are:
SELECT value FROM umbracoLock WHERE id=@0 and UPDATE umbracoLock SET value = (CASE WHEN (value=1) THEN -1 ELSE 1 END) WHERE id=@0
Examining the [UmbracoLock] table, which changes frequently as I know, shows consistent problematic lock states:
Since this isn't the first time we encounter this malfunction, this time we tried to address it like with the following:
We upgraded the EC2 instances into a stronger ones.
Modifications were made directly via the Master server, bypassing the load-balancer.
Customer traffic was reduced during the change (though not entirely stopped).
Minimal changes were implemented to limit potential issues - 2 data types, 1 document type.
Manual modifications we did to the [UmbracoLock] table reverted quickly by Umbraco.
The only solution that worked was restoring a database backup, resulting in lost modifications.
Interestingly, a week before this issue, document type creation and modification were successful.
Basically, apart from the temporary slowness we experience every time, until a month ago we didn't encounter this problem at all.
This problem now hinders any modifications or additions to document types, making it a critical issue. We're seeking insights and advice.
"Broken" application in a load-balanced environment after document type modification
We're using Umbraco 8.2.2 with a large number of nodes (tens or even hundreds of thousands) in a load-balanced environment, and operations such as creating, modifying, or deleting document types were always very slow, taking up to 15 minutes or more. This also spikes our CPU and Memory usage in our servers.
Our setup is on AWS, with:
Lately, after modifying or creating a new document type, there is a temporary slowness, and while the Master server functions as expected, the Front server seems to be malfunctioned and continuously "restarts" itself, and basically the application never loads there.
Restarting IIS, servers, and the database doesn't this problem
Our logs show:
AWS Performance Insights indicates that the most demanding SQL operations are:
SELECT value FROM umbracoLock WHERE id=@0
andUPDATE umbracoLock SET value = (CASE WHEN (value=1) THEN -1 ELSE 1 END) WHERE id=@0
Examining the [UmbracoLock] table, which changes frequently as I know, shows consistent problematic lock states:
Since this isn't the first time we encounter this malfunction, this time we tried to address it like with the following:
The only solution that worked was restoring a database backup, resulting in lost modifications.
Interestingly, a week before this issue, document type creation and modification were successful.
Basically, apart from the temporary slowness we experience every time, until a month ago we didn't encounter this problem at all.
This problem now hinders any modifications or additions to document types, making it a critical issue. We're seeking insights and advice.
Thanks!
is working on a reply...