All documents links are hash (#), and umbraco.config does not generate.
I am currently running Umbraco version 4.7.1.1, and I am working on a large Umbraco installation (around 302,000) nodes. The site has been down for a week. Basically the umbraco.config got corrupted, and in the Properties tab all document links are a hash (#).
Link to document#
This has happened before, but in the past Umbraco would just rebuild the umbraco.config file, and by the next morning it would be working again. This has stopped happening. Umbraco is no longer attempting to rebuild the config file, but I don't see anything in the umbracoLog that would explain why. So I am currently trying to republish the 300K documents manually, but that's going to take days.
Please advise:
Why do all the links become #, and why would the umbraco.config file not rebuild?
Can a single Umbraco instance handle over 300,000 nodes?
It's going to take time, but it will rebuilt the whole XML content. Should fix your issue.
If Umbraco does not rebuild the umbraco.config file it usually means that some XML is corrupted and the file cannot be created. Rebuilding the XML content for each document should fix the issue.
Umbraco should handle 300,000 nodes _but_ then if you do things like add a property to a doctype, while users edit content, there are some concurency issues that can trash the XML of one node... and one trashed node is enough to prevent the whole umbraco.config file from being generated.
The only way I resolved it was to republish everything manually. It took about a week. At 300K nodes I couldn't republish everything in one shot. It would just time out. So I had to republish in smaller chunks, but I eventually republished everything.
The same thing happened again this past Friday. (I foolishly tried to rename a folder.) I originally thought the problem was a corrupted umbraco.config, so I saved earlier copies hoping I could just restore it, but that didn't work. I now think the problem is a corrupted cmsContentXml table.
I tried to rebuild the cmsContentXml table using http://<your domain>/umbraco/dialogs/republish.aspx?xml=true, and it was working for a while, but then it timed out. That just made things worse. Now I can't even log into the Umbraco site. I get a SqlException (0x80131904): Timeout expired. I will have to restore everything from an earlier backup.
Not too happy with Umbraco right now. If you have a lot of content, it just can't handle it.
Stephen, I have been doing the replublish for about an hour now. Still going on.
There are two things I've noticed though. My umbraco.config is not getting any larger or smaller. The update date umbraco.config is not recent. So i've been monitoring the umbracolog table. And right now all I see is errors. Errors on the rootnoede (-1) relating to UmbracoExamine.
This is waht it says: [UmbracoExamine] Error indexing queue items,There is an error in XML document (9, 123)., IndexSet: InternalIndexSet
Is this influencing the updating of the umbraco.config? In other words can I just cancel my republish or do I let it finish?
Oh another thing: I can't republish nodes manually. Not all of them. Of course the ones that I cannot republish manually are the bulk ones. I just get an error.
Any help gratly appreciated.
Server Error in '/' Application.
Value cannot be null. Parameter name: attribute
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.ArgumentNullException: Value cannot be null. Parameter name: attribute
Source Error:
An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Ok so the republish has finished. No difference at all. Still get hashed links. I am now able to manually publish a node though. But still no luck. Even after manual publish of an individual node I still get a hashed link for that node.
Right now I'm leaning towards making a usercontrol to fetch my content directly from the database based on nodeId in the querystring as all my content seems to be fine in the database. It's just the umbraco.config/index that seems to be fouled up.
One thing I've noticed about republishing: you have to make sure any folders or nodes above the ones you're trying to publish are already published. Otherwise it won't build the document node path, and you'll still have the hash link.
Some good news: I can log into Umbraco site now. I'm no longer getting the timeout trying to log in. Don't know why; it just is what it is.
I added a user control to Dashboard.config; running code similar to http://<your domain>/umbraco/dialogs/republish.aspx?xml=true, but with ScriptTimeout = int.MaxValue, so it (hopefully) doesn't timeout like last time.
We have a problem similar to the one above - probably hundreds of thousands of nodes and periodic crashes w/# tags showing as path of nodes. We then have to republish our entire site which is time consuming and causes severe problems and disruptions on both the frontend and backend. One can see the size of our site below. We have over 50 sites in one umbraco instance. My questions concerns the comment by Stephen of Pilotine in France who stated:
"Umbraco should handle 300,000 nodes _but_ then if you do things like add a property to a doctype, while users edit content, there are some concurency issues that can trash the XML of one node... and one trashed node is enough to prevent the whole umbraco.config file from being generated."
Stephen,
Would you please elaborate on your comment above, specifically the concurrency issue. Your answer seems to suggest that you cannot make any changes to document types in the production environment while people are editing w/out courting disaster. Why is this an issue with 300,000 nodes vs. an average site of lets say 30,000 nodes. Are you saying in effect that Umbraco does not scale well. What can you recommend to deal w/this problem other than locking out users from the back end when changing document types?
@Richard : what I meant by "concurrency issue" is that as of 4.x various operations in Umbraco, such as editing a doctype or saving a document, are not atomic (think database transaction) so Umbraco can be both updating content due to a doctype edit, and updating content due to a save, leading to unpredictable results. Usually, a corrupted XML content for one node. This becomes an issue when you have lots of nodes because then operations such as editing a doctype starts to take time--and the longer it takes the more chances there are that someone edits some content in the meantime.
So yes, I guess that's a scalability issue... and the only workaround that I'm aware of is to lock users out of the backend while doing some changes.
Does that make sense to you?
The good side is that, as far as I understand, the new API that comes with 6.x does support unit-of-work or transaction concepts and so should not have the same issues (needs to be tested, though).
Thanks, Steven for your clear response and sage advise.
Your recommendation to prevent this catastrophic occurance, ie. to lock out users in this situation, is exactly what we decided to do in a conference call. See a;sp http://our.umbraco.org/projects/collaboration/content-freeze. My developer tells me (I'm the project manager / Director of Web Strategy for the site) that we have 20,000 nodes, not 300,000, suggesting that even a db of our size, roughly 5% of Anthony's i susceptible to this problem. In our case, it takes 3-5 hours to republish all the nodes and some nodes have the # for filename and others don't. It's sort of like a bomb goes off and shrapnel is sprayed randomly hosing filenames. Or another analogy is of a blot clot that throws out bits of tissue, one piece of which lodges in the brain of the db and causes a stroke. This occurence of a near simultaneous updating of content and doctype can take place, regardless of the number of nodes. The only issue is probability. No Umbraco installation is really 100% immune from these issues, which makes it understandable how there are so many reports of this happening (not just on this thread) - see Periodic Random Unpublished (?) Nodes with Hosed URLs in Umbraco 4.7.1
Umbraco bills itself as an enterprise CMS. Is the behavior above consistent w/that representation? I can't anwer that. It's great that 6.x supports "unit-of-work or transaction concepts", but this does little to help the vast majority of installations that are and will remain on 4.x.
One other thing to note. We did not have this problem on 4.0x. The problems started in 4.71 and they have been a disaster time-wise and in terms of other resoources that have been expended, the equivalent of being hit w/a hurricane with no Doppler radar to predict the occurence.
I wonder whether the HQ people know about this, or if they know, whether the implications have sunk in.
We had the same problem and the live site was down because if mars characters. The problem would be solved if you have the same issue. Ty to open the database with management studio and look into umbracoLog table. If there some errors (exceptions), that will help you to figure out.
All documents links are hash (#), and umbraco.config does not generate.
I am currently running Umbraco version 4.7.1.1, and I am working on a large Umbraco installation (around 302,000) nodes. The site has been down for a week. Basically the umbraco.config got corrupted, and in the Properties tab all document links are a hash (#).
Link to document #
This has happened before, but in the past Umbraco would just rebuild the umbraco.config file, and by the next morning it would be working again. This has stopped happening. Umbraco is no longer attempting to rebuild the config file, but I don't see anything in the umbracoLog that would explain why. So I am currently trying to republish the 300K documents manually, but that's going to take days.
Please advise:
Why do all the links become #, and why would the umbraco.config file not rebuild?
Can a single Umbraco instance handle over 300,000 nodes?
Hi Anthony,
I'm experiencing the exact same issue. hav e you managed to resolve this?
Try to hit http://YOURDOMAIN/umbraco/dialogs/republish.aspx?xml=true
It's going to take time, but it will rebuilt the whole XML content. Should fix your issue.
If Umbraco does not rebuild the umbraco.config file it usually means that some XML is corrupted and the file cannot be created. Rebuilding the XML content for each document should fix the issue.
Umbraco should handle 300,000 nodes _but_ then if you do things like add a property to a doctype, while users edit content, there are some concurency issues that can trash the XML of one node... and one trashed node is enough to prevent the whole umbraco.config file from being generated.
The only way I resolved it was to republish everything manually. It took about a week. At 300K nodes I couldn't republish everything in one shot. It would just time out. So I had to republish in smaller chunks, but I eventually republished everything.
The same thing happened again this past Friday. (I foolishly tried to rename a folder.) I originally thought the problem was a corrupted umbraco.config, so I saved earlier copies hoping I could just restore it, but that didn't work. I now think the problem is a corrupted cmsContentXml table.
I tried to rebuild the cmsContentXml table using http://<your domain>/umbraco/dialogs/republish.aspx?xml=true, and it was working for a while, but then it timed out. That just made things worse. Now I can't even log into the Umbraco site. I get a SqlException (0x80131904): Timeout expired. I will have to restore everything from an earlier backup.
Not too happy with Umbraco right now. If you have a lot of content, it just can't handle it.
Stephen, I have been doing the replublish for about an hour now. Still going on.
There are two things I've noticed though. My umbraco.config is not getting any larger or smaller. The update date umbraco.config is not recent.
So i've been monitoring the umbracolog table. And right now all I see is errors. Errors on the rootnoede (-1) relating to UmbracoExamine.
This is waht it says: [UmbracoExamine] Error indexing queue items,There is an error in XML document (9, 123)., IndexSet: InternalIndexSet
Is this influencing the updating of the umbraco.config? In other words can I just cancel my republish or do I let it finish?
@Anthony: I feel your pain man :(
Oh another thing: I can't republish nodes manually. Not all of them. Of course the ones that I cannot republish manually are the bulk ones. I just get an error.
Any help gratly appreciated.
Server Error in '/' Application.
Value cannot be null.
Parameter name: attribute
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.ArgumentNullException: Value cannot be null.
Parameter name: attribute
Source Error:
An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Stack Trace:
Ok so the republish has finished. No difference at all. Still get hashed links. I am now able to manually publish a node though. But still no luck. Even after manual publish of an individual node I still get a hashed link for that node.
Right now I'm leaning towards making a usercontrol to fetch my content directly from the database based on nodeId in the querystring as all my content seems to be fine in the database. It's just the umbraco.config/index that seems to be fouled up.
One thing I've noticed about republishing: you have to make sure any folders or nodes above the ones you're trying to publish are already published. Otherwise it won't build the document node path, and you'll still have the hash link.
Some good news: I can log into Umbraco site now. I'm no longer getting the timeout trying to log in. Don't know why; it just is what it is.
I added a user control to Dashboard.config; running code similar to http://<your domain>/umbraco/dialogs/republish.aspx?xml=true, but with ScriptTimeout = int.MaxValue, so it (hopefully) doesn't timeout like last time.
We have a problem similar to the one above - probably hundreds of thousands of nodes and periodic crashes w/# tags showing as path of nodes. We then have to republish our entire site which is time consuming and causes severe problems and disruptions on both the frontend and backend. One can see the size of our site below. We have over 50 sites in one umbraco instance. My questions concerns the comment by Stephen of Pilotine in France who stated:
"Umbraco should handle 300,000 nodes _but_ then if you do things like add a property to a doctype, while users edit content, there are some concurency issues that can trash the XML of one node... and one trashed node is enough to prevent the whole umbraco.config file from being generated."
Stephen,
Would you please elaborate on your comment above, specifically the concurrency issue. Your answer seems to suggest that you cannot make any changes to document types in the production environment while people are editing w/out courting disaster. Why is this an issue with 300,000 nodes vs. an average site of lets say 30,000 nodes. Are you saying in effect that Umbraco does not scale well. What can you recommend to deal w/this problem other than locking out users from the back end when changing document types?
Thanks.
Richard Barg
@Richard : what I meant by "concurrency issue" is that as of 4.x various operations in Umbraco, such as editing a doctype or saving a document, are not atomic (think database transaction) so Umbraco can be both updating content due to a doctype edit, and updating content due to a save, leading to unpredictable results. Usually, a corrupted XML content for one node. This becomes an issue when you have lots of nodes because then operations such as editing a doctype starts to take time--and the longer it takes the more chances there are that someone edits some content in the meantime.
So yes, I guess that's a scalability issue... and the only workaround that I'm aware of is to lock users out of the backend while doing some changes.
Does that make sense to you?
The good side is that, as far as I understand, the new API that comes with 6.x does support unit-of-work or transaction concepts and so should not have the same issues (needs to be tested, though).
Thanks, Steven for your clear response and sage advise.
Your recommendation to prevent this catastrophic occurance, ie. to lock out users in this situation, is exactly what we decided to do in a conference call. See a;sp http://our.umbraco.org/projects/collaboration/content-freeze. My developer tells me (I'm the project manager / Director of Web Strategy for the site) that we have 20,000 nodes, not 300,000, suggesting that even a db of our size, roughly 5% of Anthony's i susceptible to this problem. In our case, it takes 3-5 hours to republish all the nodes and some nodes have the # for filename and others don't. It's sort of like a bomb goes off and shrapnel is sprayed randomly hosing filenames. Or another analogy is of a blot clot that throws out bits of tissue, one piece of which lodges in the brain of the db and causes a stroke. This occurence of a near simultaneous updating of content and doctype can take place, regardless of the number of nodes. The only issue is probability. No Umbraco installation is really 100% immune from these issues, which makes it understandable how there are so many reports of this happening (not just on this thread) - see Periodic Random Unpublished (?) Nodes with Hosed URLs in Umbraco 4.7.1
Umbraco bills itself as an enterprise CMS. Is the behavior above consistent w/that representation? I can't anwer that. It's great that 6.x supports "unit-of-work or transaction concepts", but this does little to help the vast majority of installations that are and will remain on 4.x.
One other thing to note. We did not have this problem on 4.0x. The problems started in 4.71 and they have been a disaster time-wise and in terms of other resoources that have been expended, the equivalent of being hit w/a hurricane with no Doppler radar to predict the occurence.
I wonder whether the HQ people know about this, or if they know, whether the implications have sunk in.
@Richard: I think HQ is aware of the issue, and 6.x should be the solution. There was no way to fix it easily in 4.x. No way.
Hi ,
On upgrade 4.7.1 to 6.0 ,I think lots of package installed in current installation(4.7.1) become not work on 6.0
regards
mithun
We had the same problem and the live site was down because if mars characters. The problem would be solved if you have the same issue. Ty to open the database with management studio and look into umbracoLog table. If there some errors (exceptions), that will help you to figure out.
is working on a reply...