Is it suitable to use Umbraco for big media websites that has 1 million or more pages? We have a big newspaper client who wants to use Umbraco for their CMS.
What are the technical considerations if we decide to use Umbraco for the site?
I can see most people suggest Drupal and there are extensions focused on media companies https://thunder.org/about-thunder but we would like to use Umbraco as it provides much nicer CMS experience.
Basically older archive news items were populated using azure search to keep them out of main site xml cache. That stopped their xml cache file getting too big. I recall there was a talk on how they did it.
In short yes you can do it but not out of the box. Like Ismail says you have to do some trickery. Of those 1 million content items very few are active, 5% of them might make up 99% of your traffic so thats the 5% you keep in Umbraco and the others you only load from the DB (or other storage medium) as and when needed/requested. This keeps the back office speedy for editing new content while still allowing your site to server up old content (which shouldn't change).
There was quite a bit of fun along the way, but the principle is working well, as each fortnight a new edition of the paper triggers the importer to add another 50-60 articles into Umbraco. (Articles go back until 1997!, but older articles are rarely accessed, and so are served from 'archive' rather than the Umbraco Cache, but because their content is indexed in Azure Search, the archived articles remain searchable across the site.
Umbraco for large websites
Is it suitable to use Umbraco for big media websites that has 1 million or more pages? We have a big newspaper client who wants to use Umbraco for their CMS.
What are the technical considerations if we decide to use Umbraco for the site?
I can see most people suggest Drupal and there are extensions focused on media companies https://thunder.org/about-thunder but we would like to use Umbraco as it provides much nicer CMS experience.
Ajmal,
I know moriyama did rebuild of www.antiquestradegazette.com they also created some packages they used. One for mass publishing https://our.umbraco.org/projects/backoffice-extensions/publication-queue/ also they put news content into azure search https://our.umbraco.org/projects/developer-tools/moriyama-azure-search/.
Basically older archive news items were populated using azure search to keep them out of main site xml cache. That stopped their xml cache file getting too big. I recall there was a talk on how they did it.
Regards
Ismail
In short yes you can do it but not out of the box. Like Ismail says you have to do some trickery. Of those 1 million content items very few are active, 5% of them might make up 99% of your traffic so thats the 5% you keep in Umbraco and the others you only load from the DB (or other storage medium) as and when needed/requested. This keeps the back office speedy for editing new content while still allowing your site to server up old content (which shouldn't change).
Hi Ajmal
Yes, what Pete and Ismail say above - lots of little hoops to jump through - but we put together a blog post outlining the gist of the approach here:
https://www.moriyama.co.uk/about-us/news/blog-the-need-for-archived-content-in-umbraco-and-how-to-do-it/
There was quite a bit of fun along the way, but the principle is working well, as each fortnight a new edition of the paper triggers the importer to add another 50-60 articles into Umbraco. (Articles go back until 1997!, but older articles are rarely accessed, and so are served from 'archive' rather than the Umbraco Cache, but because their content is indexed in Azure Search, the archived articles remain searchable across the site.
regards
Marc
is working on a reply...