We were responsible for developing a large news site in Umbraco. As well as this, they had 30,000+ old articles and news posts needing to be migrated from Ektron. The Ektron part proved to be the most difficult part so I won't really get into that. ContentService ended up working pretty well for this part, although it did have some quirks. Such as allowing me to create documents without a title that would error when trying to be deleted, and when setting the CreateDate it needed to be saved twice to hold the data. The 301 Url Tracker plugin was easy to use and quite fast I will say.
One issue we ran into early on even before going live was performance. As we imported more and more articles the performance got worse and worse. This plagued the project all the way until go live. Unfortunately there isn't a lot of documentation on the differences of a small site vs a large site. Forum users were very helpful here though. I wish Umbraco had a wiki page dedicated to different performance issues you could run into.
The main culprit was any call to Descendants. If you're making a large site I wouldn't recommend using this call anywhere. After about 100 nodes within the descendant tree, it really starts to bog down. With some of our nodes having 10,000+ children you can see the issue.
Luckily there was a hero in Examine. Examine is very fast even for large data sets. I ended up converting a large number of queries that used Descendant type calls to Examine. It did take a bit of working piecing together forum posts to make it workable.
The site went live and there weren't any major issues.
Hi Brad, I have a large media website I'm thinking to migrate to Umbraco and that has 100K+ articles (title, images, rich content - a lot of it) and wondering if you had any issues with the way Umbraco caching works e.g. I understand it caches all data (so it would 100K articles) on the start up, publish/unpublish would modify the cache, flash xml to disk as well etc.
Thanks
Pavel
Yes all published items reside in the umbraco.config xml file in the app_data folder, so on really large sites the size of this file, and the time writing to it can be a problem.
At Moriyama we put together a notion of 'archived content' on a large media site, you can read the blog post here:
But essentially the idea is to allow items to be unpublished from the xml cache file, but still available on the website as if they were 'published' from an archive, we used Azure Search to return published and archived content in searches and listing pages.
This enabled the site to have the large numbers of articles, but without the strain on the Umbraco.Config xml cache file.
You can honestly probably get by with disabling the XML cache as long as you extensively use partial caching, and examine. Our umbraco.config is 200mb~ and we don't really notice any issues.
Large Site Post Mortem
So I thought it would be helpful for other users if I gave a rundown and the do's and donts personally ran into for a large site we developed.
If you want to check it out: http://www.worldoil.com
We were responsible for developing a large news site in Umbraco. As well as this, they had 30,000+ old articles and news posts needing to be migrated from Ektron. The Ektron part proved to be the most difficult part so I won't really get into that. ContentService ended up working pretty well for this part, although it did have some quirks. Such as allowing me to create documents without a title that would error when trying to be deleted, and when setting the CreateDate it needed to be saved twice to hold the data. The 301 Url Tracker plugin was easy to use and quite fast I will say.
One issue we ran into early on even before going live was performance. As we imported more and more articles the performance got worse and worse. This plagued the project all the way until go live. Unfortunately there isn't a lot of documentation on the differences of a small site vs a large site. Forum users were very helpful here though. I wish Umbraco had a wiki page dedicated to different performance issues you could run into.
The main culprit was any call to Descendants. If you're making a large site I wouldn't recommend using this call anywhere. After about 100 nodes within the descendant tree, it really starts to bog down. With some of our nodes having 10,000+ children you can see the issue.
Luckily there was a hero in Examine. Examine is very fast even for large data sets. I ended up converting a large number of queries that used Descendant type calls to Examine. It did take a bit of working piecing together forum posts to make it workable.
The site went live and there weren't any major issues.
Let me know if you have any questions!
Thanks for sharing this Brad! Great to hear about different Umbraco experiences and solutions.
Hi Brad, I have a large media website I'm thinking to migrate to Umbraco and that has 100K+ articles (title, images, rich content - a lot of it) and wondering if you had any issues with the way Umbraco caching works e.g. I understand it caches all data (so it would 100K articles) on the start up, publish/unpublish would modify the cache, flash xml to disk as well etc. Thanks Pavel
Hi Pavel,
Yes all published items reside in the umbraco.config xml file in the app_data folder, so on really large sites the size of this file, and the time writing to it can be a problem.
At Moriyama we put together a notion of 'archived content' on a large media site, you can read the blog post here:
https://www.moriyama.co.uk/about-us/news/blog-the-need-for-archived-content-in-umbraco-and-how-to-do-it/
But essentially the idea is to allow items to be unpublished from the xml cache file, but still available on the website as if they were 'published' from an archive, we used Azure Search to return published and archived content in searches and listing pages.
This enabled the site to have the large numbers of articles, but without the strain on the Umbraco.Config xml cache file.
regards
Marc
You can honestly probably get by with disabling the XML cache as long as you extensively use partial caching, and examine. Our umbraco.config is 200mb~ and we don't really notice any issues.
is working on a reply...