Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Michael Perry 11 posts 43 karma points
    Jan 17, 2019 @ 13:40
    Michael Perry
    1

    Recurring 503 errors with Azure App Service

    Hello! We have an Umbraco application that we are running on Azure as an App Service. We initially launched the site three years ago on Umbraco 7.2.8. The application contains one main site, and several additional root nodes that represent "microsites" using shared templates and media with the main site.

    Over the summer, we upgraded the application to Umbraco 7.11.1 and deployed the upgrade to production in early fall. Starting in late October, the client reported outages where the application would go offline, usually while they were working in Umbraco managing content. The outages typically last for 5 - 20 minutes and result in a 503 "Service Unavailable" error, although we have had some that lasted longer than 20 minutes.

    If we catch it while it is unavailable, we are able to restart the application in Azure and it will come back online. If we don't move quick enough, it will usually self-correct and usually come back up on its own within 20 minutes of the outage without any manual intervention.

    Although we had deployed some relatively small changes before the outages started, the only change that seems to correspond as far as timing goes is the implementation of an SSL certificate for the site and all of the microsites. We are continuing to make and deploy changes to the application, and the code changes seem to have no direct impact on the outages.

    The outages are becoming more frequent in the last month or so, to where we now expect about one outage per day. We are not using load balancing, and the application appears to have plenty of resources, with memory usage always steady at 25% and CPU usage typically under 4% with occasional spikes up to 7%.

    We have not been able to identify anything in either the Umbraco logs or the Azure app logs that have been helpful. The only thing that shows up consistently in the logs every time we see an outage is the following (Keep Alive URL obfuscated for security):

    2019-01-15 19:21:18,516 [P15832/D4/T81] ERROR Umbraco.Web.Scheduling.KeepAlive - Failed (at "https://XXXXXXXXXX.com:443/umbraco").
    System.Threading.Tasks.TaskCanceledException: A task was canceled.
       at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
       at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
       at Umbraco.Web.Scheduling.KeepAlive.<PerformRunAsync>d__4.MoveNext()
    

    This error will repeat several times in the logs while the application is offline, and then disappear again once the app is back online. We have been chasing this for months now, and we still have no idea how to proceed . We recently installed New Relic APM, which has given us some more insight into what the application was doing when the outage started, but we still are unable to uncover any red flags. Has anyone ever seen any issues like this while using Azure. Any recommendations of things we can try next?

    Thanks for reading, and thanks in advance for any assistance.

  • Steve Smith 75 posts 158 karma points
    Jan 18, 2019 @ 10:29
    Steve Smith
    0

    Interested in your post, because we also have an Umbraco Azure App Service instance, which has broken (returning 503 errors) for the past 2 nights running.

    Prior to this, it had been working perfectly since October.

    No code deployed since 7 January. Ours isn't auto-healing either. The site breaks and stays broken until we restart the app service.

    One of my colleagues thinks it could be memory related, but we're still looking in to it.

    If you found any answers, can you share please? Thanks!

  • Dimitar Dyankov 3 posts 75 karma points
    Jan 30, 2019 @ 16:19
    Dimitar Dyankov
    1

    Hey,

    We have the same issue. We recently decided to use CI with the azure app services and the sites keep going down for 1/2 minutes every now and then. We kept seeing an error about schedule publishing and the task was canceled + others.

    Suggestion online said that we probably had the wrong permissions or something was miss configured however the same site running on IIS was perfectly fine before.

    In addition to this, we started monitoring the staging site as well as the live site to see if it also goes down, however, the staging site does not go down as the live one does. Which makes me think it is something to do with memory or something the users are doing that is causing Umbraco to restart. Both live and staging are running the same code base.

    We also investigated the possibility of it being the file change notification. We tried turning it off but again it did not stop the issue.

    We have this issue happening on two sites one with version 7.5.4 and another with 7.11.1

Please Sign in or register to post replies

Write your reply to:

Draft