Testing with millions of pages - optimising the backoffice
Hi guys
I'm currently prototyping the Umbraco CMS with potentially millions of active pages (Currently prototyped with 400,000 pages, got hit by resource issue).
I have already set the XmlCacheEnabled to False, since we will have third party caching server on top of the Umbraco.
Using Umbraco 7.12.2
I have a site structure that looks like this
page 1
Page 1.1
Page 1.1.1
Page 1.1.2
Page 1.1.3
Page ...
Page 1.2
Page 1.3
...
page 2
...
Page 1.1 contains (9000 sub / sub sub / sub sub sub pages)
I've noticed that in the backoffice, if i click on the Page 1.1, it loads all the sub pages in memory. Hence, it loads really slow when I click on the "Page 1.1" (Memory resource goes up to 4000-5000mb) but loads fast when I click on the sub pages. (This is an issue when I login as Admin and clicked on the Home node, or 2nd level).
Is there anyway of not loading the sub pages, when I click on the parent node or any page?
That's a good idea, Let me give it a try.
I'll have to create new Document Types and convert it so it might take some time, but will keep this update it.
I had to re-create the website again with 400,000 pages, because the other one stopped responding for some reason and I couldn't debug it.
Anyhow, using the list view seems to worked for the backoffice. However I have same issue with the front end (Client facing). When I load the home page for the first time, I have to wait for a long time, then I get an error message.
IIS Worker Process: 3,075.8 MB
SQL Server: Around 3,000 MB
This is a log:
2018-09-20 13:28:28,631 [P2212/D2/T1] INFO Umbraco.Core.CoreBootManager - Umbraco 7.12.2 application starting on EC2AMAZ-KPL4I55
2018-09-20 13:28:28,644 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Determining hash of code files on disk
2018-09-20 13:28:28,653 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Hash determined (took 9ms)
2018-09-20 13:28:28,706 [P2212/D2/T1] INFO Umbraco.Core.MainDom - Acquiring MainDom...
2018-09-20 13:28:28,706 [P2212/D2/T1] INFO Umbraco.Core.MainDom - Acquired MainDom.
2018-09-20 13:28:28,709 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving umbraco.interfaces.IDiscoverable
2018-09-20 13:28:28,754 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved umbraco.interfaces.IDiscoverable (took 44ms)
2018-09-20 13:28:28,754 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving umbraco.interfaces.IApplicationStartupHandler
2018-09-20 13:28:28,755 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved umbraco.interfaces.IApplicationStartupHandler (took 1ms)
2018-09-20 13:28:28,785 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving umbraco.interfaces.IDiscoverable
2018-09-20 13:28:28,785 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved umbraco.interfaces.IDiscoverable (took 0ms)
2018-09-20 13:28:28,786 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving Umbraco.Core.PropertyEditors.IPropertyEditorValueConverter
2018-09-20 13:28:28,786 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved Umbraco.Core.PropertyEditors.IPropertyEditorValueConverter (took 0ms)
2018-09-20 13:28:28,786 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving umbraco.interfaces.IDiscoverable
2018-09-20 13:28:28,786 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved umbraco.interfaces.IDiscoverable (took 0ms)
2018-09-20 13:28:28,787 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving Umbraco.Core.PropertyEditors.IPropertyValueConverter
2018-09-20 13:28:28,789 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved Umbraco.Core.PropertyEditors.IPropertyValueConverter (took 1ms)
2018-09-20 13:28:28,795 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving umbraco.interfaces.IDiscoverable
2018-09-20 13:28:28,795 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved umbraco.interfaces.IDiscoverable (took 0ms)
2018-09-20 13:28:28,795 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving Umbraco.Web.Mvc.SurfaceController
2018-09-20 13:28:28,796 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved Umbraco.Web.Mvc.SurfaceController (took 0ms)
2018-09-20 13:28:28,796 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving umbraco.interfaces.IDiscoverable
2018-09-20 13:28:28,796 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved umbraco.interfaces.IDiscoverable (took 0ms)
2018-09-20 13:28:28,796 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving Umbraco.Web.WebApi.UmbracoApiController
2018-09-20 13:28:28,797 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved Umbraco.Web.WebApi.UmbracoApiController (took 0ms)
2018-09-20 13:28:30,399 [P2212/D2/T1] INFO Umbraco.Core.DatabaseContext - CanConnect = True
2018-09-20 13:28:30,525 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolving Umbraco.Core.Models.PublishedContent.PublishedContentModel
2018-09-20 13:28:30,528 [P2212/D2/T1] INFO Umbraco.Core.PluginManager - Resolved Umbraco.Core.Models.PublishedContent.PublishedContentModel (took 2ms)
2018-09-20 13:28:30,589 [P2212/D2/T1] INFO Umbraco.Web.Cache.CacheRefresherEventHandler - Initializing Umbraco internal event handlers for cache refreshing
2018-09-20 13:28:30,707 [P2212/D2/T1] INFO Umbraco.Web.Search.ExamineEvents - Initializing Examine and binding to business logic events
2018-09-20 13:28:30,707 [P2212/D2/T1] INFO Umbraco.Web.Search.ExamineEvents - Adding examine event handlers for index providers: 3
2018-09-20 13:28:30,747 [P2212/D2/T1] INFO Umbraco.Core.CoreBootManager - Umbraco application startup complete (took 2193ms)
2018-09-20 13:28:31,075 [P2212/D2/T10] INFO Umbraco.Core.Sync.ApplicationUrlHelper - New ApplicationUrl detected: http://##.##.##.###:8011/umbraco
2018-09-20 13:28:31,075 [P2212/D2/T10] INFO Umbraco.Core.Sync.ApplicationUrlHelper - ApplicationUrl: http://##.##.##.###:8011/umbraco (UmbracoModule request)
2018-09-20 13:28:31,399 [P2212/D2/T10] INFO umbraco.content - Loading content from database...
2018-09-20 13:30:28,468 [P2212/D2/T10] ERROR Umbraco.Core.Persistence.UmbracoDatabase - Exception (06ac2b31).
The thread has been aborted, because the request has timed out.
System.Threading.ThreadAbortException: Thread was being aborted.
at SNIReadSyncOverAsync(SNI_ConnWrapper* , SNI_Packet** , Int32 )
at SNINativeMethodWrapper.SNIReadSyncOverAsync(SafeHandle pConn, IntPtr& packet, Int32 timeout)
at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
at System.Data.SqlClient.TdsParserStateObject.TryReadByteArray(Byte[] buff, Int32 offset, Int32 len, Int32& totalRead)
at System.Data.SqlClient.TdsParserStateObject.TryReadString(Int32 length, String& value)
at System.Data.SqlClient.TdsParser.TryReadSqlStringValue(SqlBuffer value, Byte type, Int32 length, Encoding encoding, Boolean isPlp, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.TryReadSqlValue(SqlBuffer value, SqlMetaDataPriv md, Int32 length, TdsParserStateObject stateObj, SqlCommandColumnEncryptionSetting columnEncryptionOverride, String columnName)
at System.Data.SqlClient.SqlDataReader.TryReadColumnInternal(Int32 i, Boolean readHeaderOnly)
at System.Data.SqlClient.SqlDataReader.TryReadColumn(Int32 i, Boolean setTimeout, Boolean allowPartiallyReadColumn)
at System.Data.SqlClient.SqlDataReader.GetValueInternal(Int32 i)
at System.Data.SqlClient.SqlDataReader.GetValue(Int32 i)
at petapoco_factory_7(IDataReader )
at Umbraco.Core.Persistence.Database.<Query>d__74`1.MoveNext()
2018-09-20 13:30:28,474 [P2212/D2/T10] ERROR umbraco.content - Error Republishing
System.InvalidOperationException: Internal connection fatal error. Error state: 15, Token : 115
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.DrainData(TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlInternalConnectionTds.ValidateConnectionForExecute(SqlCommand command)
at System.Data.SqlClient.SqlInternalTransaction.Rollback()
at System.Data.SqlClient.SqlTransaction.Rollback()
at Umbraco.Core.Persistence.Database.CleanupTransaction()
at Umbraco.Core.Persistence.Database.AbortTransaction()
at Umbraco.Core.Scoping.Scope.DisposeLastScope()
at Umbraco.Core.Scoping.Scope.Dispose()
at Umbraco.Core.Persistence.UnitOfWork.ScopeUnitOfWork.DisposeResources()
at Umbraco.Core.DisposableObjectSlim.Dispose(Boolean disposing)
at Umbraco.Core.DisposableObjectSlim.Dispose()
at Umbraco.Core.Services.ContentService.BuildXmlCache()
at umbraco.content.LoadContentFromDatabase()
Is there anyway to not load all the pages in the Client side? So retrieve the data from the database, instead of caching in memory.
The reason why I disabled the xml cache was because the file size was reaching 3gb. and was really slow saving the data (This might of been because I didn't use the listing previously).
Let me try turn on again and republish.
However, Is there anyway of only retrieving the data for that page only? so when I hit the home page, it doesn't load all the pages?
Does Umbraco has page limit on how many pages it can publish?
I just recreated the Server (Web and DB Server) to mimic the production environment, and I'm running the script now to populate the pages. I'll probably have to leave the script running overnight.
The code is straight out of box, I haven't done any coding yet. I'm just testing the page capacity.
My homepage cshtml looks like this, and no controller has been used.
However, what I'll be doing in the future is I will hijack the process using the controller and return the JSON data. Then front-end uses that data to display the data.
I'll make this demo site publicly accessible, once I finish populating the pages.
Update on this:
I have created 2 websites pointing to the same database for testing purpose.
XmlCacheEnabled - False:
First website, I turned off the cache because of the Umbraco.Config file was getting over 2gb. However, it seems it stored all the pages in memory (Used 6gb of memory then hang). So this is probably not a good option.
XmlCacheEnabled - True:
On the 2nd time, I generated 1,500,000 pages. However, the size of the umbraco.config was 200mb. When I look into the file, there's a lot of content missing. I went into the backend and republished some of the pages, but that didn't update the Umbraco.Config file. When I look at the URL, it said "This document is published but is not in the cache"
I ran "/Umbraco/dialogs/republish.aspx?xml=true"
And left for 8 hours but the system hangs, and I can't view the website. CPU usage was at 0%.
I'm thinking in the controller, I go through all the pages and publish the page 1 by 1.
Another update on this.
I have upgrade our web server and it seems to be working quite well. Admin is a bit slow, but it's still usable. Only thing is it uses alot of memory.
umbraco.config is 5.9GB
approx 1.5 million active pages
avg memory usage on the web server is 18~27GB
This is my current Server spec:
Web Server
Amazon EC2 (t2.2xlarge)
vcpu: 8
Memory: 32GB
DB Server:
Amazon RDS (db.t2.xlarge)
cpu: 4
Memory: 16GB
In avg, the web server uses 18GB ~ 27GB in memory. I think it uses alot of memory because Umbraco keeps all the page in memory for easy write to the umbraco.config.
Does anyone know anyway to reduce the memory size?
I hardly ever see a site with 1.5 million pages that are actively visited. A lot of older content maybe get's visited only once a month. This blog post is good solution for that.
Thank you for the feedback, I have went through this blog as well. However, potentially we most likely going to have millions of records. I have tried using the examine to search 1.5 millions records and it was extremely fast. So I might create another layer on top of Umbraco.
This is most likely one of the biggest data website I've worked on (in my 20yrs of programming experience). I did recommend custom CMS initially but after I showed Umbraco backend as an example, they loved how flexible it was.
Let me have more think about the architect this website (or web application).
Testing with millions of pages - optimising the backoffice
Hi guys
I'm currently prototyping the Umbraco CMS with potentially millions of active pages (Currently prototyped with 400,000 pages, got hit by resource issue).
I have already set the XmlCacheEnabled to False, since we will have third party caching server on top of the Umbraco.
Using Umbraco 7.12.2
I have a site structure that looks like this
Page 1.1 contains (9000 sub / sub sub / sub sub sub pages)
I've noticed that in the backoffice, if i click on the Page 1.1, it loads all the sub pages in memory. Hence, it loads really slow when I click on the "Page 1.1" (Memory resource goes up to 4000-5000mb) but loads fast when I click on the sub pages. (This is an issue when I login as Admin and clicked on the Home node, or 2nd level).
Is there anyway of not loading the sub pages, when I click on the parent node or any page?
Thank you
Shinsuke
Hi Shinsuke,
You can set the parent node to be a list view. Which means it no longer expands but instead shows its children in a paged list view.
You would then need to add a list view property to the child pages so you can see their children as well.
Hope this helps
Matt
Hi Matt,
Thank you for the reply.
That's a good idea, Let me give it a try. I'll have to create new Document Types and convert it so it might take some time, but will keep this update it.
Cheers
Hi Matt,
I had to re-create the website again with 400,000 pages, because the other one stopped responding for some reason and I couldn't debug it.
Anyhow, using the list view seems to worked for the backoffice. However I have same issue with the front end (Client facing). When I load the home page for the first time, I have to wait for a long time, then I get an error message.
IIS Worker Process: 3,075.8 MB
SQL Server: Around 3,000 MB
This is a log:
Is there anyway to not load all the pages in the Client side? So retrieve the data from the database, instead of caching in memory.
Thank you
Shinsuke
Hi Shinsuke,
This is problay because you disabled the xml cache.
Normally Umbraco loads content for the front end from the xml cache.
If that is not active it needs to hit the database.
Could you try with the xml cache on ?
Dave
Hi Dave,
The reason why I disabled the xml cache was because the file size was reaching 3gb. and was really slow saving the data (This might of been because I didn't use the listing previously).
Let me try turn on again and republish.
However, Is there anyway of only retrieving the data for that page only? so when I hit the home page, it doesn't load all the pages?
Does Umbraco has page limit on how many pages it can publish?
Can you post the code of your homepage ?
Maybe something in there is causing the slowdown.
Dave
Hi Dave,
I just recreated the Server (Web and DB Server) to mimic the production environment, and I'm running the script now to populate the pages. I'll probably have to leave the script running overnight.
The code is straight out of box, I haven't done any coding yet. I'm just testing the page capacity. My homepage cshtml looks like this, and no controller has been used.
However, what I'll be doing in the future is I will hijack the process using the controller and return the JSON data. Then front-end uses that data to display the data.
I'll make this demo site publicly accessible, once I finish populating the pages.
Hi guys,
Update on this: I have created 2 websites pointing to the same database for testing purpose.
XmlCacheEnabled - False: First website, I turned off the cache because of the Umbraco.Config file was getting over 2gb. However, it seems it stored all the pages in memory (Used 6gb of memory then hang). So this is probably not a good option.
XmlCacheEnabled - True: On the 2nd time, I generated 1,500,000 pages. However, the size of the umbraco.config was 200mb. When I look into the file, there's a lot of content missing. I went into the backend and republished some of the pages, but that didn't update the Umbraco.Config file. When I look at the URL, it said "This document is published but is not in the cache" I ran "/Umbraco/dialogs/republish.aspx?xml=true" And left for 8 hours but the system hangs, and I can't view the website. CPU usage was at 0%.
I'm thinking in the controller, I go through all the pages and publish the page 1 by 1.
However, do I need to do this everytime?
Hi guys,
Another update on this. I have upgrade our web server and it seems to be working quite well. Admin is a bit slow, but it's still usable. Only thing is it uses alot of memory.
This is my current Server spec:
DB Server:
In avg, the web server uses 18GB ~ 27GB in memory. I think it uses alot of memory because Umbraco keeps all the page in memory for easy write to the umbraco.config.
Does anyone know anyway to reduce the memory size?
Hi Shinsuke,
One option depending on the pages you have would be to "Archive" some of the pages. There is a great blog post on this approach here: https://www.moriyama.co.uk/about-us/news/blog-the-need-for-archived-content-in-umbraco-and-how-to-do-it/
Matt
Hi Shinsuke,
I was about to post the same blog as matthew did.
I hardly ever see a site with 1.5 million pages that are actively visited. A lot of older content maybe get's visited only once a month. This blog post is good solution for that.
Dave
Hi guys,
Thank you for the feedback, I have went through this blog as well. However, potentially we most likely going to have millions of records. I have tried using the examine to search 1.5 millions records and it was extremely fast. So I might create another layer on top of Umbraco.
This is most likely one of the biggest data website I've worked on (in my 20yrs of programming experience). I did recommend custom CMS initially but after I showed Umbraco backend as an example, they loved how flexible it was.
Let me have more think about the architect this website (or web application).
Keep you guys updated.
is working on a reply...