How to handle and archive large amount of news, and still be able to get them - Umbraco 8
Hi guys,
We have large amount of data ( 500,000+ ) news and 500GB media. and we intend to migrate them into Umbraco, we did many POC's for this issue, we added more than 40,000 nodes to Umbraco8. and we had some issues :
The navigation through the backoffice was slow.
Saving and publishing the Parent node or any node has large number of children takes a long time.
Adding/Changing any documentType property takes a long time as well.
Something we expect to happen: displaying news will be very slow, since the cache file would be too large.
We have found some workaround solutions, such as making an archive and move to it the old news in order not to cache the old news .. another solution was to display the active news only in Umbraco backoffice (last three months).
Okay now my questions are :
- What's your best solution for this issue?, how would you handle it.
- How to archive old news, knowing that we're using Examine to retrieve nodes.. and we need to have the old news URL's still work?
- How can we exclude the archive from the Umbraco cache?.
- How to archive the media as well, since we have large media, we don't want the old media to appear in the Umbraco section.
<>
If that helps provide some inspiration, but yes, main gist is News Articles need to be gettable, but not necessarily 'published' as far as Umbraco is concerned.
Thanks for sharing that article. I have a similar need at the moment however in my case I also need to allow the archived item(s) back into a space where an editor can make changes as well.
So it is not just about serving the last version of the content.
I think for those scenarios where the content does not need to be edited again the approach of storing the xml in a custom table and using the url to retrieve it works really well.
I am at presenting thinking of either making a separate section all together to hold those nodes or having a node in the content section where "old content" can be relocated to and unpublished.
Your solution is the best bet of creating a new section and just doing a move or allowing your editors to Move the articles to hew "archive" section. You will still be taking up that space on your server, but they are still reachable, editable, and gettable.
We did this and then did inventory with the editors that if was anything not needed, they needed to start going to the browser and saving the pages and deleting anything not necessary.
Your other option is to do "Staging" and "Prod" or named whatever, and your staging has all the content that gets pushed to "Prod" and if you needed space on the Prod server, you would just delete the nodes on Prod and keep the staging server as you archive and just have your editors publish in Staging to push to Prod.
How to handle and archive large amount of news, and still be able to get them - Umbraco 8
Hi guys,
We have large amount of data ( 500,000+ ) news and 500GB media. and we intend to migrate them into Umbraco, we did many POC's for this issue, we added more than 40,000 nodes to Umbraco8. and we had some issues :
We have found some workaround solutions, such as making an archive and move to it the old news in order not to cache the old news .. another solution was to display the active news only in Umbraco backoffice (last three months).
Okay now my questions are :
- What's your best solution for this issue?, how would you handle it.
- How to archive old news, knowing that we're using Examine to retrieve nodes.. and we need to have the old news URL's still work?
- How can we exclude the archive from the Umbraco cache?.
- How to archive the media as well, since we have large media, we don't want the old media to appear in the Umbraco section.
<>
Thanks,
Hi Saif
We had a similar issue in V7 for a site and this is how we approached a solution:
https://www.moriyama.co.uk/about-us/news/blog-the-need-for-archived-content-in-umbraco-and-how-to-do-it/
If that helps provide some inspiration, but yes, main gist is News Articles need to be gettable, but not necessarily 'published' as far as Umbraco is concerned.
regards
Marc
Hi Saif
Thanks for sharing that article. I have a similar need at the moment however in my case I also need to allow the archived item(s) back into a space where an editor can make changes as well.
So it is not just about serving the last version of the content.
I think for those scenarios where the content does not need to be edited again the approach of storing the xml in a custom table and using the url to retrieve it works really well.
I am at presenting thinking of either making a separate section all together to hold those nodes or having a node in the content section where "old content" can be relocated to and unpublished.
@Francis,
Your solution is the best bet of creating a new section and just doing a move or allowing your editors to Move the articles to hew "archive" section. You will still be taking up that space on your server, but they are still reachable, editable, and gettable.
We did this and then did inventory with the editors that if was anything not needed, they needed to start going to the browser and saving the pages and deleting anything not necessary.
Your other option is to do "Staging" and "Prod" or named whatever, and your staging has all the content that gets pushed to "Prod" and if you needed space on the Prod server, you would just delete the nodes on Prod and keep the staging server as you archive and just have your editors publish in Staging to push to Prod.
is working on a reply...