Umbraco 4.7 system architecture advice to work with 500K+ nodes
Hello, can anybody help me to find out it will be possible to work with 500K+ nodes without speed impact.I need to sort them and search; each node is about 10 properties in total, like name, price, company and country. So, I need to narrow it like 500k->country->10k->city->1k->company 100. My guess is SQL server and stored procedures will be the best solution for doing this job. But I need to utilize all what Umbraco can do with nodes. If you have any experience or ideas how implement this solution, please give me some hints. Best if you have some links for some concrete implementation (blog post or tutorials etc).
I have sent you a private mail with my contact information. Right now i am working on solution with > 10K nodes and i have > 5 million pages view per month. I would glad to help you and share my experience
Since I'm curious about this topic I'd like to know about your experiences with it. Also, from an open-source perspective and future reference, I think it would be a nice move to share it here, if you are willing to do so, that is ;-)
I'd also be interested in hearing about this. I recently built a site with 15000 nodes, each with around 100 properties and it just did not work. In fact the front end rendering was okay as you'd expect if the server has enough memory but it was just impossible to bulk publish any reasonable volume of nodes without hitting a timeout brick wall.
I know there are very big websites running Umbraco 4 but I personally struggled to find any resources detailing the techniques used to accomplish this. Anything shared here would be very valuable information for sure.
Yes, ofcause i would like to share my knowledge with community.
So what i have been experienced. I have around > 10K node right now in Umbraco. And around 50% (> 5000 nodes) are the same documentType, let's call it "docTypeOne". If to use uComponents to access nodes with document type "docTypeOne", filter them etc - it is slow. If access document types, which don't have so many nodes - that it is working fine.
So i have been using some technics to speed up site.
1. When accessing and trying to get nodes with "docTypeOne" (we have > 5000 nodes), we have been switch to Examine Search http://examine.codeplex.com/
2. Add in backend some more caching to accessing the nodes and putting results to RunTime Cache.
The most big advantage was to review structure of data in umbraco. I have been moved out some data from umbraco to DB, which were accessing not so ofter, and keep in Umbraco some data:
- used with Node navigate logic;
- data which were change often by content managers
Some of data in umbraco had a link to data in DB by Guid, Id or some other parameters.
Aslo i have done custom section, where that data, which were transfered to DB, was accessed via EF, normal DAL or some other way (by needs)
Such way umbraco.config file was reduced in size a lot, what was helpful for speed.
I did a conclusion for my self, that keeping big ammount of data in umbraco is not good for performance. So normally, i am using a mix of umbraco and database storage. But it depends on project site and ammount of data. If i am expecting that ammount of nodes will be not > than 5000 - i am using only umbraco.
The other issues related to republish big ammount of nodes. Yeah, this is a problem and process was taking some time, to republish nodes. Normally it was needed when i was updating the site and umbraco structure was changed.
So when updating the site i am following such procedure:
1. Close access for all content managers (to avoid to loose the data) - on SITE-1
2. Make copy of site and point domain to it. (SITE-COPY)
3. Implement all changes on SITE-1.
4. Republish nodes on SITE-1, normally when site is not under load republish process of > 10K nodes taking around 30-40 min. I am monitoring process using sql query to get count of published node in umbraco DB.
5. When changes are done, pointing domain to SITE-1.
6. Open up access for all content managers.
I didn't experionces with timeout problems when i was republish all node.
If somebody has also some other tips and tricks about, how umbraco site with big ammounts of data can be structured for best performance, i would really appriciate it.
Umbraco 4.7 system architecture advice to work with 500K+ nodes
Hello, can anybody help me to find out it will be possible to work with 500K+ nodes without speed impact.I need to sort them and search; each node is about 10 properties in total, like name, price, company and country. So, I need to narrow it like 500k->country->10k->city->1k->company 100. My guess is SQL server and stored procedures will be the best solution for doing this job. But I need to utilize all what Umbraco can do with nodes. If you have any experience or ideas how implement this solution, please give me some hints. Best if you have some links for some concrete implementation (blog post or tutorials etc).
Thanks
Anatoliy
Hi Anatoliy.
I have sent you a private mail with my contact information. Right now i am working on solution with > 10K nodes and i have > 5 million pages view per month. I would glad to help you and share my experience
Dmitriy
Hi Anatoliy and Dmitriy,
Since I'm curious about this topic I'd like to know about your experiences with it. Also, from an open-source perspective and future reference, I think it would be a nice move to share it here, if you are willing to do so, that is ;-)
Thanks in advance.
All the best,
Bo
I'd also be interested in hearing about this. I recently built a site with 15000 nodes, each with around 100 properties and it just did not work. In fact the front end rendering was okay as you'd expect if the server has enough memory but it was just impossible to bulk publish any reasonable volume of nodes without hitting a timeout brick wall.
I know there are very big websites running Umbraco 4 but I personally struggled to find any resources detailing the techniques used to accomplish this. Anything shared here would be very valuable information for sure.
Hi guys.
Yes, ofcause i would like to share my knowledge with community.
So what i have been experienced. I have around > 10K node right now in Umbraco. And around 50% (> 5000 nodes) are the same documentType, let's call it "docTypeOne". If to use uComponents to access nodes with document type "docTypeOne", filter them etc - it is slow. If access document types, which don't have so many nodes - that it is working fine.
So i have been using some technics to speed up site.
1. When accessing and trying to get nodes with "docTypeOne" (we have > 5000 nodes), we have been switch to Examine Search http://examine.codeplex.com/
2. Add in backend some more caching to accessing the nodes and putting results to RunTime Cache.
The most big advantage was to review structure of data in umbraco. I have been moved out some data from umbraco to DB, which were accessing not so ofter, and keep in Umbraco some data:
- used with Node navigate logic;
- data which were change often by content managers
Some of data in umbraco had a link to data in DB by Guid, Id or some other parameters.
Aslo i have done custom section, where that data, which were transfered to DB, was accessed via EF, normal DAL or some other way (by needs)
Such way umbraco.config file was reduced in size a lot, what was helpful for speed.
I did a conclusion for my self, that keeping big ammount of data in umbraco is not good for performance. So normally, i am using a mix of umbraco and database storage. But it depends on project site and ammount of data. If i am expecting that ammount of nodes will be not > than 5000 - i am using only umbraco.
The other issues related to republish big ammount of nodes. Yeah, this is a problem and process was taking some time, to republish nodes. Normally it was needed when i was updating the site and umbraco structure was changed.
So when updating the site i am following such procedure:
1. Close access for all content managers (to avoid to loose the data) - on SITE-1
2. Make copy of site and point domain to it. (SITE-COPY)
3. Implement all changes on SITE-1.
4. Republish nodes on SITE-1, normally when site is not under load republish process of > 10K nodes taking around 30-40 min. I am monitoring process using sql query to get count of published node in umbraco DB.
5. When changes are done, pointing domain to SITE-1.
6. Open up access for all content managers.
I didn't experionces with timeout problems when i was republish all node.
Dan, i think at http://YOURDOMAIN/Umbraco/dialogs/republish.aspx?xml=true you can extend time out, to not hit it.
If somebody has also some other tips and tricks about, how umbraco site with big ammounts of data can be structured for best performance, i would really appriciate it.
P.S. My contact info
d (dot) skudnov (at) gmail (dot) com
Feel free to contact me in any questions.
is working on a reply...