One of our requirements is to regularly import/update a sizable number of nodes from an external data source. CMSImport continues to look like the best tool for this chore.
Just for reference... when I say sizable, we're talking about 100K+ nodes on a weekly basis.
As you might expect, I've run into issues at this scale with jobs timing out or running out of resources (primarily memory).
I am looking to address these issues by:
Creating a fairly extensive hierarchy of Import Definitions and associated scheduled tasks
Configuring a more robust production environment
Questions:
When it comes to scheduled tasks... are they executed asyncronously or are they queued to run 1-at-a-time?
My intent is to schedule them to avoid conflict, but I'd like to avoid a task having a cascade effect on others.
Are there any other CMSImport considerations that I should keep in mind at this scale?
I have my doubts that Umbraco nodes are a good fit when it comes to changing 100K nodes on a weekly basis. Isn't it better to use a database for this?Or maybe have a little background so I might have a good advise for you?
CMSImport could propably also have a memory issue since it has most records in memory during the import which is not a problem up to 10/20K nodes but after that I'm not sure.
The scheduler runs async indeed but will still eat resouces and with this amount of records it could be an issue.
Thanks for the speedy reply. Maybe I should change the design to go directly to the database. Honestly I had been pursuing this tact to keep the front-end development as simple as possible. However if the complexity/issues I create by bringing the data into Umbraco are too significant, then I should go the other way. Your reaction indicates that this is the the case.
CMSImport - Scheduled Tasks
Background:
One of our requirements is to regularly import/update a sizable number of nodes from an external data source. CMSImport continues to look like the best tool for this chore.
Just for reference... when I say sizable, we're talking about 100K+ nodes on a weekly basis.
As you might expect, I've run into issues at this scale with jobs timing out or running out of resources (primarily memory).
I am looking to address these issues by:
Hi Jeremy,
I have my doubts that Umbraco nodes are a good fit when it comes to changing 100K nodes on a weekly basis. Isn't it better to use a database for this?Or maybe have a little background so I might have a good advise for you?
CMSImport could propably also have a memory issue since it has most records in memory during the import which is not a problem up to 10/20K nodes but after that I'm not sure.
The scheduler runs async indeed but will still eat resouces and with this amount of records it could be an issue.
Hope to hear from you.
Best,
Richard
Richard,
Thanks for the speedy reply. Maybe I should change the design to go directly to the database. Honestly I had been pursuing this tact to keep the front-end development as simple as possible. However if the complexity/issues I create by bringing the data into Umbraco are too significant, then I should go the other way. Your reaction indicates that this is the the case.
I appreciate your candor.
Cheers
Jeremy
Hi Jeremy,
Yes might be best.
Cheers,
Richard
is working on a reply...