Press Ctrl / CMD + C to copy this to your clipboard.
This post will be reported to the moderators as potential spam to be looked at
Hey everyone, got a bit of an odd one and I'm at a loss really.
Got a site running 8.6 on an Azure Web App and often but randomly content is "going missing". It's still in the backoffice but it gives a 404 on the front end. When investigating the page shows the following:
This document is published but its url cannot be routed
The quick fix is to republish the page but the client is getting annoyed that this is happening over and over as they can't be expected to constantly check all the links on their site for it, which I get. This is the only v8 site this is happening on and it's a single language "simple" site.
All the App Settings for Azure are set up correctly, including the new main dom lock setting.
Any one got ideas on ways to investigate why this is happening or what to look for as I'm out of ideas!
That particular message is thrown here:
When DetectCollision() is called on an item...
Is it always the same page? - if so is there any template name or umbracoUrlAlias or custom content finder that could be 'finding' a different node that isn't published/hasn't a template?
If it's different pages, then instead of immediately publishing the page - check the status of nucache...
does rebuilding nucache cache, make the page available again?
And does it mainly affect freshly created pages? or are quite old pages just as likely to go missing? - essentially the theory is, it is perhaps related to a server transitioning and not building the nucache correctly?
but publishing obviously immediately puts the item back into published cache (but perhaps not into the nucache files)...
... if you create a new page, and publish, then trigger recycle of the app pool ... does it remain published?
Yeah, it's really odd.
It's not the same page, but some do seem to be reported more often than others. The site doesn't have any custom content finders or umbracoUrlAlias things set up, and I "don't think" there are any potential conflicts with templates names that I've been able to spot.
I've asked the client to let us know next time they spot it so we can investigate as and when it is happening, but we've asked that in the past and they've still published the page. Makes it much harder to investigate!
I've now installed the NuCache Explorer package in the site to try and let us investigate the actual cache itself when it's happening.
I agree with the theory that is related to a server transition/azure change happening behind the scenes but my knowledge of how to trace something like that is limited. I'm looking at potentially running a custom build of Umbraco with extra logging around the area you've linked to added in to try and see more about what is happening.
I suspect it is happening with more than just the pages we are being notified about, but it's spotted when it happens to a page in the main site navigation.
The following 3 things all bring the page back:
Does seem NuCache related rather than a clash then... (have had clashing before in previous version - where last published clashing version took precedent, so republishing temporarily resolved the problem).
Agree that finding out what is in NuCache when the issue occurs + extra logging is the next step.
Just thinking, make sure the site isn't 'scaled out' in Azure (I'm sure it isn't... but...)
Did it occur before using the SQLMainDomLock?
Does it occur with IgnoreLocalDb set to true? https://our.umbraco.com/Documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps#v860---v861
(you shouldn't need to set this if it's a single WebApp, but be interested to see if it has an affect - also set up applicationinitialization - https://our.umbraco.com/Documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps#v860---v861 - so if web app does transition... it should be warmed up first)
Also do any dev machines or staging/preview envs connect to the same database? (check the umbracoServers table) - eg are we accidentally load balancing, and this server is considering itself a replica, and consequently SQLMainDomainLock won't work (fixed in 8.6.2)
Also in the logs are there corresponding Umbraco re-starts - App is shutting down messages?
Yeah it occurred before SQLMainDomLock, it's been happening since the site launched which was around version 8.4.*. We originally thought it was NuCache and locking related so had been hoping the 8.6 release would have fixed it, but it hasn't unfortunately.
I've not tried turning the LocalDb off, that can be something to try for sure.
Just checked the umbracoServers table and that is clear apart from a single instance so don't believe any other environments are pointing at it.
I've not been able to tie up app restarts in the logs to instances of the pages vanishing, but I do keep trying to. The problem with that aim is we never know exactly when they disappear.
Did also see this: https://github.com/umbraco/Umbraco-CMS/pull/7907
Are paths 'ok' - for the content items that disappear... eg matching their parent and level correctly?
There is a SQL gist to provide a quick check: https://gist.github.com/Shazwazza/1a04dcd1c1b6f16d3b7f167874770a84
Okay so this happened again and I am certain now it's related to the "inmemory" instances of NuCache.
Because I was able to look into it before the client / someone republished I was able to examine the NuCache database file (using NuCache Explorer package). This contained the page that was 404 erroring with no problems. Correct ID, single instance etc.
Doing a "Memory Cache" reload in the back office resolves the 404 issue, which very much infers that the issue lies with some sort of nucache memory corruption happening.
At the point of doing the the Published Cache Status was OK with the same number of content items being shown in the status as was shown when I examined the NuCache database.
Any thoughts on how we can examine the in memory instance of NuCache as opposed to the on disk version?
(never done this)
But can you 'collect' a NuCache snapshot...
Anything in the logs? and is it the same content missing or pattern of content?
Some pages seem to be more than others, but the only ones we "know" about are the ones from the main navigation, so don't know if it's happening to pages deeper in the site. If the node has children, it doesn't mean that those children won't be available so that's an odd one.
Hmm, I have no idea what "collect"ing a snapshot would do or how you'd view the collected snapshot. I could try that next time and see what happens.
Nothing in the logs that I can see.
is working on a reply...
Write your reply to:
Image will be uploaded when post is submitted