I don't think there's an efficient way to do a whole list of documents in one go.......
Can you give us an idea of what the nodes are for? Generating 5 - 10 documents every few minutes all the time is going to result in a monstrous number of documents in a short amount of time (assuming 7 docs every 3 minutes, that's 3360 a day, 104,160 a month, and 1,249,920 a year respectively). Over time, it'll become unmangeable through the back end.
Depending on what the documents are for, you might be better storing the auto generated stuff in a custom database table.
Yep sure, I'm getting some data from an external website using their API, and then recreating the info as umbraco nodes. My primary aim is to to generate some statistics off the info I am collecting, I'm not concerned with being able to manage the nodes in any way as I realise the number will quickly get way too big.
Although I would be interested in a way of content managing a small subset of them ( lets say the top 100 performers on a key metric ), I'm guessing to do this I'll have to somehow abstract those nodes out of the massive tree they will be a part of, just so I can deal with them.
Do you have any recommendations for what I'm doing? Can you see any problems arrising from maintatining such a massive amount of data within umbraco? Should I even publish them? (I.E the umbraco.config will become massive wondering if that will impose a bottleneck)
If it's just data for reporting, I'd just store it in a database table, and then you can use SQL to generate the statistics. I suspect that'd be much faster than trying to store that amount of information as umbraco nodes.
If you have it in a table, you also won't be bound by Umbraco's node schema, which might make running complex queries against the imported data more tricky than you'd like. If you wanted to be able to manage some (or all) of the data from the table in the Umbraco back end, you could write a custom section to manage the data. Here's an example of writing a custom section for the back office: http://www.geckonewmedia.com/blog/2009/8/3/how-to-create-a-custom-section-in-umbraco-4
I really am in two places about this one now, I've heard the performance when reading from the data umbraco caches in ram (from umbraco.config) is very quick (faster that looking up the DB), but if I want to run a variety of queries on lets say a series of 10000-100000 rows of data, (or in umbraco.config's case, what would be nodes) I am still unsure which will be the quickest / or perhaps the more applicable question is, would handling that much data as nodes be useable (provided the front end page searches through the data for example and shows stats based on a date range)
All this talk and thought has made it apparent to me that my application is in great danger of becoming too slow / unusable.
For this type of of application would you suggest I just create custom tables in the DB, maybe set up LINQ to SQL (or is there a umbraco control i can use for talking to the DB?) for modifing that and then create some sort of custom datatype for viewing the data. (This would have to have a few options, like a table view system, where you can speciy the date range of data you want to view over E.T.C)
It's either that or storing this stuff as nodes (which gives a nice GUI layout and I must admit I do like the idea, but performance is the key issue here!!
Sorry it took so long to get back and have been busy thinking about things and wanted to formulate the right question.
The nodes in umbraco are stored as XML, so that means using XPath to do your queries. Umbraco uses the Microsoft XML implementation, so you're stuck with XSL 1.0, so you'll be missing a lot of useful query types, like grouping etc.
If you're wanting to run reporting type queries on the data, I think SQL qould be you're best bet, as it's got a more fully featured language for mining the data (grouping, aggregate queries, pivot tables etc). It's also optimised for running queries on large sets of data (as long as your table is indexed correctly). If the reporting info doesn't need to be updated each time it's requested, you could look at caching the returned data, to avoid uneccessary trips to the database.
I'd definately avoid storing it as nodes. The back office tree is nice, but imagine trying to browse through the tree if it had a million records in it, finding specific nodes would be pretty painful!
If you want to edit/view the data in the back office, you could set up either a custom section, or a dashboard, that's effectively just a normal .aspx/.ascx page, allowing you to use things like datagrids etc (if you're going to use a datagrid with paging, look at using SQL Server paging, otherwise the grid will load ALL of the records to calculate the paging, which would be VERY slow with all those records!).
For talking to the DB, you can either write your own code using standard ADO.Net code, or you can use in the built in SQL Helper library that lives in the umbraco.datalayer dll (I think). If you have a look at the source on codeplex, there are loads of example of how the CMS talks to the database using the sql helper.
You could look at using LINQ to SQL or something similar, although I'm not sure what the performance would be on a dataset that size, @Slace is the man to ask for that one, he's our resident LINQ guru!
Efficent publishing of a List<Document>
Hey there
I'm dynamically creating nodes and ATM have it configured to
On every iteration, I'll most likely be creating 5-10 documents, and this will be triggered by a CRON job which will probably fire every few minutes.
Then after all documents made and published I am calling:
I'm a little worried about server load here, wondering if theres a way to push a pre-published List<Document> onto Umbraco in one go?
Any recomendations?
Cheers!
I don't think there's an efficient way to do a whole list of documents in one go.......
Can you give us an idea of what the nodes are for? Generating 5 - 10 documents every few minutes all the time is going to result in a monstrous number of documents in a short amount of time (assuming 7 docs every 3 minutes, that's 3360 a day, 104,160 a month, and 1,249,920 a year respectively). Over time, it'll become unmangeable through the back end.
Depending on what the documents are for, you might be better storing the auto generated stuff in a custom database table.
Hey Tim thanks for the reply.
Yep sure, I'm getting some data from an external website using their API, and then recreating the info as umbraco nodes. My primary aim is to to generate some statistics off the info I am collecting, I'm not concerned with being able to manage the nodes in any way as I realise the number will quickly get way too big.
Although I would be interested in a way of content managing a small subset of them ( lets say the top 100 performers on a key metric ), I'm guessing to do this I'll have to somehow abstract those nodes out of the massive tree they will be a part of, just so I can deal with them.
Do you have any recommendations for what I'm doing? Can you see any problems arrising from maintatining such a massive amount of data within umbraco? Should I even publish them? (I.E the umbraco.config will become massive wondering if that will impose a bottleneck)
Any help appreciated!
If it's just data for reporting, I'd just store it in a database table, and then you can use SQL to generate the statistics. I suspect that'd be much faster than trying to store that amount of information as umbraco nodes.
If you have it in a table, you also won't be bound by Umbraco's node schema, which might make running complex queries against the imported data more tricky than you'd like. If you wanted to be able to manage some (or all) of the data from the table in the Umbraco back end, you could write a custom section to manage the data. Here's an example of writing a custom section for the back office: http://www.geckonewmedia.com/blog/2009/8/3/how-to-create-a-custom-section-in-umbraco-4
Thanks for reply again !
I really am in two places about this one now, I've heard the performance when reading from the data umbraco caches in ram (from umbraco.config) is very quick (faster that looking up the DB), but if I want to run a variety of queries on lets say a series of 10000-100000 rows of data, (or in umbraco.config's case, what would be nodes) I am still unsure which will be the quickest / or perhaps the more applicable question is, would handling that much data as nodes be useable (provided the front end page searches through the data for example and shows stats based on a date range)
All this talk and thought has made it apparent to me that my application is in great danger of becoming too slow / unusable.
For this type of of application would you suggest I just create custom tables in the DB, maybe set up LINQ to SQL (or is there a umbraco control i can use for talking to the DB?) for modifing that and then create some sort of custom datatype for viewing the data. (This would have to have a few options, like a table view system, where you can speciy the date range of data you want to view over E.T.C)
It's either that or storing this stuff as nodes (which gives a nice GUI layout and I must admit I do like the idea, but performance is the key issue here!!
Sorry it took so long to get back and have been busy thinking about things and wanted to formulate the right question.
Any help appreciated again!
Cheers
The nodes in umbraco are stored as XML, so that means using XPath to do your queries. Umbraco uses the Microsoft XML implementation, so you're stuck with XSL 1.0, so you'll be missing a lot of useful query types, like grouping etc.
If you're wanting to run reporting type queries on the data, I think SQL qould be you're best bet, as it's got a more fully featured language for mining the data (grouping, aggregate queries, pivot tables etc). It's also optimised for running queries on large sets of data (as long as your table is indexed correctly). If the reporting info doesn't need to be updated each time it's requested, you could look at caching the returned data, to avoid uneccessary trips to the database.
I'd definately avoid storing it as nodes. The back office tree is nice, but imagine trying to browse through the tree if it had a million records in it, finding specific nodes would be pretty painful!
If you want to edit/view the data in the back office, you could set up either a custom section, or a dashboard, that's effectively just a normal .aspx/.ascx page, allowing you to use things like datagrids etc (if you're going to use a datagrid with paging, look at using SQL Server paging, otherwise the grid will load ALL of the records to calculate the paging, which would be VERY slow with all those records!).
For talking to the DB, you can either write your own code using standard ADO.Net code, or you can use in the built in SQL Helper library that lives in the umbraco.datalayer dll (I think). If you have a look at the source on codeplex, there are loads of example of how the CMS talks to the database using the sql helper.
You could look at using LINQ to SQL or something similar, although I'm not sure what the performance would be on a dataset that size, @Slace is the man to ask for that one, he's our resident LINQ guru!
Hope that helps!
:)
thanks alot man! I'll look into all the things you've suggested and make a decison :D!
thanks again
is working on a reply...