After fresh deployments of our production site (during which ~\App_Data\ is deleted), sometimes the frontend part of the site refuses to load and Umbraco throws this exception: "The Xml cache is corrupt. Use the Health Check data integrity dashboard to fix it". It's a bit of a show-stopper and it keeps happening, so I'm trying to figure out some way to prevent it from occurring at the root cause.
The Health Check Data Integrity dashboard always shows all green ticks, which makes sense since it says right on the dashboard "This does not check the data integrity of the xml cache file", which is apparently what's causing the exception.
We're using TLS as we normally would and have umbracoUseSSL = true, so it's not due to that. We have a functioning workaround, which is to empty the recycle bin and then hit that "republish entire site" button, which always sorts things out, until the error randomly happens again a few weeks later (after many, many successful redeployments, I might add).
I would really appreciate any input or insight that anyone might have into
Why the issue could happen in the first place
How we can detect it in advance or investigate the problem in some other direction, or catch / log / measure correlated things
Ideally, how to find the root cause, handle that situation and prevent the whole issue
Some other thoughts:
We're running Umbraco 7.15.0. The site runs in a single-instance Azure Web App and we're using a CI/CD blue/green style deployment from Azure Devops. This means the whole site is redeployed from scratch after a merge is made into the master branch. There are two deployment slots, we deploy into a staging slot, warm it up, and then swap with the production slot. We have quite a few content editors so inevitably deployments occur while they're in the middle of their work. Typically this is fine, but I wonder whether a lot of preview/publish work at the same time as a deployment slot swap could have something to do with this?
I've seen that error a few times in the last few days, after installing uSync, starting the site, running an export, then restarting the site. Uninstall uSync, remove all related files, and site restarts fine.
I don't think uSync is the cause, but it's causing something else to flip out...
Hmm, interesting! We use usync too, but I'm a little hesitant to just remove it as usync is extremely handy for treating config changes as code. Not without a viable alternative, anyway.
The weird thing is that most of the time there aren't any issues, and the recent deployments that have had this Xml cache corruption issue contained no usync changes, so I figure it wouldn't have been doing anything differently to the previously-successful deployments...
Is it specifically your editors seeing this error, or does it happen for other users too? There was an issue with preview in 7.15.0 (fixed in 7.15.1) which would throw this exception when it failed to generate the preview version of the XML cache.
I think that emptying the recycle bin would clear that problem, since it was related to wrongly trying to generate preview data for deleted items.
Thanks for the tip about 7.15.1, hopefully this is the issue you mentioned and it's an easy fix! It just happened again, we hadn't even done a deployment but the site took ages to load so it had somehow restarted (which is weird in itself...)
Looks like at the moment the issue is just affecting editors, and only when previewing (either via the backoffice, or accessing the site directly in preview mode). I actually remembered to grab a database export and a copy of /App_Data/ this time before fixing the problem, so hopefully I get a bit further with repro.
And as before, emptying the recycle bin sorted things out immediately.
Forgot to update the solution here - we installed the 7.15.1 update and these errors seem to have stopped occurring. Thanks Steve, you were right on the money
Yeah, we haven't seen it happen once since installing the update. I never did figure out what could have been causing it or how to investigate further, although perhaps you might start with whatever was changed in 7.15.1 to fix the bug, at least it might point you in the right direction. In case you haven't already, perhaps check out some of the other solutions proposed here and here.
Does your problem also mysteriously vanish if you empty the recycle bin and republish?
Edit: the original issue for the bug fixed in 7.15.1 is here and the commit is here. Might provide some other ideas to investigate? E.g. I did not know this but 7.15 has a new preview engine, and you can turn it off by adding this to your appSettings:
<add key="Umbraco.Preview.Mode" value="Legacy" /> although it sounds like doing this is discouraged
I also somehow corrupted my XML cache on 7.15.3. I have to admit that I was abusing my (development) instance a little, like killing the application pool, deleting everything from temp etc, but I never had issues with that before.
This happend on a (local) development SQL Server 2017 instance.
I have just tested main functionalities in 7.15.5 and have noticed constant "Xml cache is corrupt" in preview mode. I have just dev envinronmet upgraded form 6.2.6. Website come from v4 primarily but the preview was working good in 6.2.6.
I have just removed items in Recycle bin, removed umbraco.config, restarted app pool, rebuilt indexes and problem still occurs.
The only one solution is to set Legacy preview mode.
Maybe we have unnecessary missed records in cmsPreviewXml. Probably it is possible to check it with cmsDocument table?
This is another bug in the v7 core and I can't understand how the latest version passed the test because I have common Umbraco setup.
I have just fixed it but PR for v7 is not expected any more.
The error in my scenario is:
2020-10-28 10:02:30,313 [P7608/D2/T28] ERROR umbraco.content - Error loading preview content
System.ArgumentOutOfRangeException: Length cannot be less than zero.
Parameter name: length
at System.String.Substring(Int32 startIndex, Int32 length)
at Umbraco.Core.Persistence.Database.BuildSqlDbSpecificPagingQuery(DBType databaseType, Int64 skip, Int64 take, String sql, String sqlSelectRemoved, String sqlOrderBy, Object[]& args, String& sqlPage)
at Umbraco.Core.Persistence.UmbracoDatabase.BuildSqlDbSpecificPagingQuery(DBType databaseType, Int64 skip, Int64 take, String sql, String sqlSelectRemoved, String sqlOrderBy, Object[]& args, String& sqlPage)
at Umbraco.Core.Persistence.Database.BuildPageQueries[T](Int64 skip, Int64 take, String sql, Object[]& args, String& sqlCount, String& sqlPage)
at Umbraco.Core.Persistence.Repositories.ContentRepository.BuildPreviewXmlCache()
at Umbraco.Core.Services.ContentService.BuildPreviewXmlCache()
at umbraco.content.LoadPreviewXmlContent()
2020-10-28 10:02:30,317 [P7608/D2/T28] ERROR Umbraco.Core.UmbracoApplicationBase - An unhandled exception occurred
System.Exception: The Xml cache is corrupt. Use the Health Check data integrity dashboard to fix it.
at Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedContentCache.GetXml(UmbracoContext umbracoContext, Boolean preview)
at Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedContentCache.HasContent(UmbracoContext umbracoContext, Boolean preview)
at Umbraco.Web.PublishedCache.ContextualPublishedCache`1.HasContent(Boolean preview)
at Umbraco.Web.UmbracoModule.EnsureHasContent(UmbracoContext context, HttpContextBase httpContext)
at Umbraco.Web.UmbracoModule.EnsureUmbracoRoutablePage(UmbracoContext context, HttpContextBase httpContext)
at Umbraco.Web.UmbracoModule.ProcessRequest(HttpContextBase httpContext)
at Umbraco.Web.UmbracoModule.<Init>b__12_3(Object sender, EventArgs e)
at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
Recurring "Xml cache is corrupt" issues
After fresh deployments of our production site (during which ~\App_Data\ is deleted), sometimes the frontend part of the site refuses to load and Umbraco throws this exception: "The Xml cache is corrupt. Use the Health Check data integrity dashboard to fix it". It's a bit of a show-stopper and it keeps happening, so I'm trying to figure out some way to prevent it from occurring at the root cause.
The Health Check Data Integrity dashboard always shows all green ticks, which makes sense since it says right on the dashboard "This does not check the data integrity of the xml cache file", which is apparently what's causing the exception.
We're using TLS as we normally would and have umbracoUseSSL = true, so it's not due to that. We have a functioning workaround, which is to empty the recycle bin and then hit that "republish entire site" button, which always sorts things out, until the error randomly happens again a few weeks later (after many, many successful redeployments, I might add).
I would really appreciate any input or insight that anyone might have into
Some other thoughts:
We're running Umbraco 7.15.0. The site runs in a single-instance Azure Web App and we're using a CI/CD blue/green style deployment from Azure Devops. This means the whole site is redeployed from scratch after a merge is made into the master branch. There are two deployment slots, we deploy into a staging slot, warm it up, and then swap with the production slot. We have quite a few content editors so inevitably deployments occur while they're in the middle of their work. Typically this is fine, but I wonder whether a lot of preview/publish work at the same time as a deployment slot swap could have something to do with this?
Any plugins/packages running on the site?
I've seen that error a few times in the last few days, after installing uSync, starting the site, running an export, then restarting the site. Uninstall uSync, remove all related files, and site restarts fine.
I don't think uSync is the cause, but it's causing something else to flip out...
Hmm, interesting! We use usync too, but I'm a little hesitant to just remove it as usync is extremely handy for treating config changes as code. Not without a viable alternative, anyway.
The weird thing is that most of the time there aren't any issues, and the recent deployments that have had this Xml cache corruption issue contained no usync changes, so I figure it wouldn't have been doing anything differently to the previously-successful deployments...
Is it specifically your editors seeing this error, or does it happen for other users too? There was an issue with preview in 7.15.0 (fixed in 7.15.1) which would throw this exception when it failed to generate the preview version of the XML cache.
I think that emptying the recycle bin would clear that problem, since it was related to wrongly trying to generate preview data for deleted items.
Thanks for the tip about 7.15.1, hopefully this is the issue you mentioned and it's an easy fix! It just happened again, we hadn't even done a deployment but the site took ages to load so it had somehow restarted (which is weird in itself...)
Looks like at the moment the issue is just affecting editors, and only when previewing (either via the backoffice, or accessing the site directly in preview mode). I actually remembered to grab a database export and a copy of /App_Data/ this time before fixing the problem, so hopefully I get a bit further with repro.
And as before, emptying the recycle bin sorted things out immediately.
Forgot to update the solution here - we installed the 7.15.1 update and these errors seem to have stopped occurring. Thanks Steve, you were right on the money
Did updating Umbraco solve your problem?
I'm experiencing a similar problem but are running Umbraco 7.15.3.
Yeah, we haven't seen it happen once since installing the update. I never did figure out what could have been causing it or how to investigate further, although perhaps you might start with whatever was changed in 7.15.1 to fix the bug, at least it might point you in the right direction. In case you haven't already, perhaps check out some of the other solutions proposed here and here.
Does your problem also mysteriously vanish if you empty the recycle bin and republish?
Edit: the original issue for the bug fixed in 7.15.1 is here and the commit is here. Might provide some other ideas to investigate? E.g. I did not know this but 7.15 has a new preview engine, and you can turn it off by adding this to your appSettings:
<add key="Umbraco.Preview.Mode" value="Legacy" />
although it sounds like doing this is discouragedSame issue here with 7.15.3. Funny thing it happens if my DB is in Azure but not in a local server.
I also somehow corrupted my XML cache on 7.15.3. I have to admit that I was abusing my (development) instance a little, like killing the application pool, deleting everything from temp etc, but I never had issues with that before.
This happend on a (local) development SQL Server 2017 instance.
I have just tested main functionalities in 7.15.5 and have noticed constant "Xml cache is corrupt" in preview mode. I have just dev envinronmet upgraded form 6.2.6. Website come from v4 primarily but the preview was working good in 6.2.6.
I have just removed items in Recycle bin, removed umbraco.config, restarted app pool, rebuilt indexes and problem still occurs.
The only one solution is to set Legacy preview mode.
Maybe we have unnecessary missed records in cmsPreviewXml. Probably it is possible to check it with cmsDocument table?
This is another bug in the v7 core and I can't understand how the latest version passed the test because I have common Umbraco setup.
I have just fixed it but PR for v7 is not expected any more.
The error in my scenario is:
I'm getting the same issue on a live site now, added more info to this post https://our.umbraco.com/forum/using-umbraco-and-getting-started/85834-the-xml-cache-is-corrupt-use-the-health-check-data-integrity-dashboard-to-fix-it
But no idea what is causing it :( Did you manage to get to the bottom of it at all?
is working on a reply...