Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Anatoliy 2 posts 23 karma points
    Jul 06, 2012 @ 10:21
    Anatoliy
    1

    Umbraco 4.7 system architecture advice to work with 500K+ nodes

    Hello, can anybody help me to find out it will be possible to work with 500K+ nodes without speed impact.I need to sort them and search; each node is about 10 properties in total, like name, price, company and country. So, I need to narrow it like 500k->country->10k->city->1k->company 100. My guess is SQL server and stored procedures will be the best solution for doing this job. But I need to utilize all what Umbraco can do with nodes. If you have any experience or ideas how implement this solution, please give me some hints. Best if you have some links for some concrete implementation (blog post or tutorials etc).

    Thanks

    Anatoliy

  • Dmitriy Skudnov 39 posts 64 karma points
    Jul 08, 2012 @ 13:37
    Dmitriy Skudnov
    0

    Hi Anatoliy.

    I have sent you a private mail with my contact information. Right now i am working on solution with > 10K nodes  and i have  > 5 million pages view per month. I would glad to help you and share my experience

    Dmitriy

  • Bo Damgaard Mortensen 719 posts 1207 karma points
    Jul 08, 2012 @ 16:43
    Bo Damgaard Mortensen
    0

    Hi Anatoliy and Dmitriy,

    Since I'm curious about this topic I'd like to know about your experiences with it. Also, from an open-source perspective and future reference, I think it would be a nice move to share it here, if you are willing to do so, that is ;-)

    Thanks in advance.

    All the best,

    Bo

     

  • Dan 1285 posts 3917 karma points c-trib
    Jul 08, 2012 @ 22:29
    Dan
    0

    I'd also be interested in hearing about this.  I recently built a site with 15000 nodes, each with around 100 properties and it just did not work.  In fact the front end rendering was okay as you'd expect if the server has enough memory but it was just impossible to bulk publish any reasonable volume of nodes without hitting a timeout brick wall.

    I know there are very big websites running Umbraco 4 but I personally struggled to find any resources detailing the techniques used to accomplish this.  Anything shared here would be very valuable information for sure.

  • Dmitriy Skudnov 39 posts 64 karma points
    Jul 09, 2012 @ 10:42
    Dmitriy Skudnov
    1

    Hi guys.

    Yes, ofcause i would like to share my knowledge with community.

    So what i have been experienced. I have around > 10K node right now in Umbraco. And around 50% (> 5000 nodes) are the same documentType, let's call it "docTypeOne". If to use uComponents to access nodes with document type "docTypeOne", filter them etc - it is slow. If access document types, which don't have so many nodes - that it is working fine.

    So i have been using some technics to speed up site. 

    1. When accessing and trying to get nodes with "docTypeOne" (we have > 5000 nodes), we have been switch to Examine Search http://examine.codeplex.com/

    2. Add in backend some more caching to accessing the nodes and putting results to RunTime Cache.

    The most big advantage was to review structure of data in umbraco. I have been moved out some data from umbraco to DB, which were accessing not so ofter, and keep in Umbraco some data:

    - used with Node navigate logic;

    - data which were change often by content managers

    Some of data in umbraco had a link to data in DB by Guid, Id or some other parameters.

    Aslo i have done custom section, where that data, which were transfered to DB, was accessed via EF, normal DAL or some other way (by needs)

    Such way umbraco.config file was reduced in size a lot, what was helpful for speed.

     

    I did a conclusion for my self, that keeping big ammount of data in umbraco is not good for performance. So normally, i am using a mix of umbraco and database storage. But it depends on project site and ammount of data. If i am expecting that ammount of nodes will be not > than 5000 - i am using only umbraco.

     

    The other issues related to republish big ammount of nodes. Yeah, this is a problem and process was taking some time, to republish nodes. Normally it was needed when i was updating the site and umbraco structure was changed.

    So when updating the site i am following such procedure:

    1. Close access for all content managers (to avoid to loose the data) - on SITE-1

    2. Make copy of site and point domain to it. (SITE-COPY)

    3. Implement all changes on SITE-1.

    4. Republish nodes on SITE-1, normally when site is not under load republish process of > 10K nodes taking around 30-40 min. I am monitoring process using sql query to get count of published node in umbraco DB.

    5. When changes are done, pointing domain to SITE-1.

    6. Open up access for all content managers.

     

    I didn't experionces with timeout problems when i was republish all node.

    Dan, i think at http://YOURDOMAIN/Umbraco/dialogs/republish.aspx?xml=true you can extend time out, to not hit it.

     

    If somebody has also some other tips and tricks about, how umbraco site with big ammounts of data can be structured for best performance, i would really appriciate it.

    P.S. My contact info 

    d (dot) skudnov (at) gmail (dot) com 

    Feel free to contact me in any questions.

Please Sign in or register to post replies

Write your reply to:

Draft