We are looking convert a news website into Umbraco 4.5.2 and I wanted to ask a few questions about how scalable it is likely to be. The website currently contains tens of thousands of pages and I want to make sure that as it grows it is not going to slow down due to the size of the XML file.
What is the reason for the XML file, can it be changed so that the site talks directly to the database? My director has some concerns that as the site gets larger, that the XML file is going to be very big which means that it will take a long time to read the data from it compared to what it would if it was talking directly to the database?
Can someone please explain how it all works and whether this is likely to be a problem, I'm sure there is a very good reason for having the whole site cached in an XML file, I would just like to know what the reason is and if it is going to have any adverse effect on the site performance both in the backend and frontend of the website?
Hi, thank you for the response. We have been doing some testing using Umbraco's publish node facility and found that it takes approximately 0.75 of a second to publish a node, if this had to be done on a site of 60000 nodes it would take approximately 12.5 hours to publish all the nodes. I know this is unlikely to happen but if we did have to republish all the nodes either due to an error with Umbraco or a user error then this would be too long a period for the site to be done. Is there anyway of speeding up this process?
Not sure about this but I remember the Microsoft guy did something to speed up the xml file build process, that video will give you lots of ideas i'm sure.
There is no way of bypassing the XML file when using Umbraco since that XML file is used extensively by Umbraco itself and also by any macros / XSLT you might write yourself.
Generally Umbraco is optimised for 'reading' (serving pages) more than it is for 'writing' (adding new pages). This is so your pages are served fast to your clients. Even with thousands of pages Umbraco can still render each page very quickly. Yes, a large number of nodes can slow down publishing - but it's not a linear equation like you make out. You can't simply multiply the time it takes to publish one node by total nodes. Also, don't forget, generally you don't need to republish the entire site every time - if you update or add one page then only one node gets published.
Thank you for the response Rich, I will take a look at the video when I get a chance.
Dan, I realise that it doesn't take longer to publish one node because you have got lots of nodes. My point is that we have had situations before where there have been problems with the database or Umbraco which mean that pages were not published properly which meant that we had to then publish all the pages again. The worry is how long this process would take on a very large site.
Umbraco, scalability and the XML file
Hi all,
We are looking convert a news website into Umbraco 4.5.2 and I wanted to ask a few questions about how scalable it is likely to be. The website currently contains tens of thousands of pages and I want to make sure that as it grows it is not going to slow down due to the size of the XML file.
What is the reason for the XML file, can it be changed so that the site talks directly to the database? My director has some concerns that as the site gets larger, that the XML file is going to be very big which means that it will take a long time to read the data from it compared to what it would if it was talking directly to the database?
Can someone please explain how it all works and whether this is likely to be a problem, I'm sure there is a very good reason for having the whole site cached in an XML file, I would just like to know what the reason is and if it is going to have any adverse effect on the site performance both in the backend and frontend of the website?
Regards
Tony
No takers on this one?
Hey Tony,
Tens of thousands shouldn't be a problem.
You can tweak Umbraco if it becomes a problem (though I doubt reading directly from the database will help), there's an interesting video with a guy from Microsoft here http://codegarden11.com/sessions/day-2/slot-one/multi-environment-team-based-development-with-umbraco-at-microsoft.aspx they have 190,000 nodes in Umbraco.
I know that Glamour magazine has around 120,000 content nodes and many more.
Rich
Should add that the XML in the file is held in memory which is why it performs so well.
Rich
Hi, thank you for the response. We have been doing some testing using Umbraco's publish node facility and found that it takes approximately 0.75 of a second to publish a node, if this had to be done on a site of 60000 nodes it would take approximately 12.5 hours to publish all the nodes. I know this is unlikely to happen but if we did have to republish all the nodes either due to an error with Umbraco or a user error then this would be too long a period for the site to be done. Is there anyway of speeding up this process?
Hey Tony,
Not sure about this but I remember the Microsoft guy did something to speed up the xml file build process, that video will give you lots of ideas i'm sure.
Rich
There is no way of bypassing the XML file when using Umbraco since that XML file is used extensively by Umbraco itself and also by any macros / XSLT you might write yourself.
Generally Umbraco is optimised for 'reading' (serving pages) more than it is for 'writing' (adding new pages). This is so your pages are served fast to your clients. Even with thousands of pages Umbraco can still render each page very quickly. Yes, a large number of nodes can slow down publishing - but it's not a linear equation like you make out. You can't simply multiply the time it takes to publish one node by total nodes. Also, don't forget, generally you don't need to republish the entire site every time - if you update or add one page then only one node gets published.
Thank you for the response Rich, I will take a look at the video when I get a chance.
Dan, I realise that it doesn't take longer to publish one node because you have got lots of nodes. My point is that we have had situations before where there have been problems with the database or Umbraco which mean that pages were not published properly which meant that we had to then publish all the pages again. The worry is how long this process would take on a very large site.
is working on a reply...