Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Ali Sheikh Taheri 470 posts 1648 karma points c-trib
    Aug 06, 2010 @ 10:40
    Ali Sheikh Taheri
    0

    Why XML and not database for umbraco?

    Just wondering why umbraco is  dealing with XML file in not database operations? (umbraco is perfect choice for small and medium projects but not for big one because of the following points)

    I believe that working with DBMS is much faster than working with files, and the structure of XML is just good for small projects.

    but when it comes to big projects with 100,000 articles then you have many redundency in database as well as umbraco.config. (it's the nature of XML.)  

    Any explanation for this?

    Ali

  • Sascha Wolter 615 posts 1101 karma points
    Aug 06, 2010 @ 12:07
    Sascha Wolter
    3

    Hi Ali,

    I'm not 100% on all of this, but here goes:
    Operations on an XML file using XSLT are tremendously faster than connecting to a DBMS, retrieving the data, converting it so that whatever language it is can understand it and then preparing the data again to display it on the site. I for my part can't believe that Xslt transformations aren't more popular in the web world, especially since e.g. MS is using XML for their documents as well and converting web site data from XML to a document is a breeze.

    In addition to that I think that the umbraco.config file also gets cached in memory so it's actually not accessed all the time (not 100% on this part though). Using a file based system works fine for e.g. Lucene as well, and although Lucene uses files it's so very very fast, even with huge dictionaries. Admittedly I share a bit your concern for really large web sites. However on one of our web sites we have about 10,000 nodes and the umbraco.config file is about 1.5MB big. That would mean 100,000 nodes come to about 15 MB which is not much.

    I don't really see your point about the redundancy in the database and XML. The structure of the Umbraco database is probably not the best one if a web site would actually be serviced first hand by it, yet since this is not the case and all frontend interactions are actually serviced by the XML cache it is a very good schema to store the data and fullfil all the additional features like versioning.

    Hope that helps,
    Sascha

  • Ali Sheikh Taheri 470 posts 1648 karma points c-trib
    Aug 06, 2010 @ 12:17
    Ali Sheikh Taheri
    0

    Hi Sascha,

    Thanks for sharing your ideas.

    I partially agree with you, but the size of umbraco.config completely depends on the number of properties you define for a document.

    I Have got a website that has about 40,000 nodes and the umbraco.config file is about 25MB.

    I completely agree with you about XSLT transfomation which is the easiest thing in the world :)

    But still I am wondering do I end up with a very slow site for enterprise web sites using umbraco?? as they might have half a million nodes or millions!! 

    Cheers

    Ali.

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Aug 09, 2010 @ 00:49
    Aaron Powell
    3

    A few points of note:

    • Databases should be on separate servers
    This can lead to latency accessing them, depending on the number of network hops, etc. There's also a the penitential to having to wait for the database server to spin up (if it's not super high traffic)
    • Databases store on a file system
    Sure it's highly optimized, but it's still a file system underneath ;)
    • RDBMS is optimised for write
    RDBMS are great when you're trying to push data into them, but when you're trying to get data out it isn't always quickest. Particularly when you're working with dynamic structures (such as what Umbraco is) you can take a lot of performance hits doing reads. This is one of the reasons for slowness in the backend, the data structures of Umbraco can get really complex if you let them.
    There's really no "one solution to rule them all", so in Umbraco we run several different levels of caching which means that we can get the best of all worlds. In-memory is faster than a file system. Local storage is faster than network storage. Combined structures are better than dynamic structures.
    That's pretty much what makes up Umbraco:
    • XML in memory
    • XML on local disk
    • XML in database (cmsContentXml)
    • Raw data in database (most of the other tables)

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Aug 09, 2010 @ 00:56
    Aaron Powell
    0

    Oh, I also meant to mention a few large scale websites running Umbraco:

    http://www.asp.net

    http://www.wired.co.uk

    http://www.wired.it

    http://www.warnerbros.com.au

  • Pickels 75 posts 108 karma points
    Aug 09, 2010 @ 18:58
    Pickels
    0

    How do those big sites manage their content? Do they have large content trees that they split up in smaller sub-trees?

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Aug 10, 2010 @ 01:05
    Aaron Powell
    0

    I can't speak for most of those sites, but with Warner Bros (which I worked on) we have the movie titles stored in a separate content tree to the 'pages' (it's a full flash site so it doesn't really have pages).

    We also serve our content via Examine for a good portion of the site, rather than the XML cache as it's faster.

    Biggest site content-wise I've worked on has ~20,000 nodes over it and again we've got a custom data layer sitting in front of Umbraco so that we can serve from our own level of cache.

Please Sign in or register to post replies

Write your reply to:

Draft