Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

  • overflew 87 posts 110 karma points
    Dec 12, 2010 @ 11:38

    SQL reporting over Umbraco content (including Propeerty Data)


    I've been asked to help a co-worker write some reports for the content we've now migrated into Umbraco. I've done a moderate amount of work with the Umbraco API, and even Linq2SQL against the Umbraco DB (which I'm now fairly familiar with), but relatively less with SQL itself. The reports must be written independent of .NET code. The reports will mostly be fed into graphs, and the occasional CSV file.

    The 2 sample queries I would be looking to write, would be:

    - To pull out all published instances of a particular document type, including key Property Data fields.

      - E.g., Pull out all current Product documents, including the Price, Weight and Manufacturer fields

    - To summarise all published instances of the above.

      - E.g. Pull out all counts of all Product documents, grouped by Manufacturer, which is sourced from a Prevalue

    With multiple Property Data items being stored as rows, not columns, I understand my options would be to look into 'Cross tab' SQL scripting, or XSLT parsing in SQL? Has anyone had any experience with either?

    I've previously seen some of the SQL in the source of umbraco.cms.businesslogic.web.Document, which is a little frightening, but I may need to get my head around...

    Any help with getting a foothold on this would be appreciated. Thanks.

  • Aaron Powell 1708 posts 3045 karma points c-trib
    Dec 12, 2010 @ 23:14
    Aaron Powell

    Trying to do this in the database itself is going to be just unpleasant, the database isn't designed to be queried like that.

    cmsContentType table is where you'll find the alias for a doc type

    umbracoNode will give you the Text (name) for a doc type as well as a doc in the tree

    cmsPropertyType is where you can find a property alias (and name)

    cmsPropertyData is the data for a property on a document, it also contains all versions


    That should get you started

  • Vince 1 post 71 karma points notactivated
    Mar 16, 2017 @ 23:13

    How are we expected to query the database without writing C# code and spinning up the entire site just to get some information out?

    Sorry for my ignorance with umbraco.

  • overflew 87 posts 110 karma points
    Dec 12, 2010 @ 23:25

    Thanks Slace,

    I'm confident I can pull out all cmsDocuments of a given content type - it's mostly around pulling out (just a few) of their associated property datas - the unpleasant part you mention...

    This would be an offline process that's just run once a week, so it can be a little hairy, but I'm just trying to get to grips with how I'd pull out, say 2x columns of property data.

  • Aaron Powell 1708 posts 3045 karma points c-trib
    Dec 12, 2010 @ 23:34
    Aaron Powell

    You can only do it by using sub queries, since the property data is all in the cmsPropertyData table  you can't just do a SELECT (all properties) WHERE docId == 1234;

    You have to have something like SELECT ( SELECT data WHERE alias == "myProperty" ) WHERE docId ==1234

    It'll end up being slow and brittle too

  • overflew 87 posts 110 karma points
    Jan 27, 2011 @ 08:24

    Wow. So I ended up writing a few similar to those above - Mostly where I just needed to pull out one property value per document.

    I also had a play with SQL pivot tables - which can work if all your data is in the same data type (e.g. dataNText, or dataDate)

    The most recent discovery is XML querying in SQL!

    -- Create temporary table that uses the XML data type
    create table #XmlTable (
    NodeId int,
    NodeXml xml
    -- Move all XML for documents of type '1121' into there
    insert into #XmlTable(NodeId, NodeXml)
    select cx.* from cmsContentXml cx
    join cmsContent cc on cc.nodeId = cx.nodeId
    where cc.contentType = 1121
    select un.text [Document Name], 
    xt.NodeXml.query('//node/data [@alias=''fieldOne'']').value('.', 'nvarchar(max)') [Field One],
    xt.NodeXml.query('//node/data [@alias=''fieldTwo'']').value('.', 'nvarchar(max)') [Field Two],
    xt.NodeXml.query('//node/data [@alias=''fieldThree'']').value('.', 'nvarchar(max)') [Field Three]
     from #XmlTable xt
     join umbracoNode un on = xt.NodeId
    -- Drop temporary table when finished
    drop table #XmlTable

    So there you go - a slightly more sustainable way to pull query across values.

    If I understood it better, I guess it could be made more efficient. I have yet to figure out if there's a way to skip the temporary table, or if I can shorten the lines that pull out values.

    Of all the tutorials that explained XML querying, this one helped the most -

    Anyone taking a punt on this may also come across the error:

    XQuery [value()]: 'value()' requires a singleton (or empty sequence),
    found operand of type 'xdt:untypedAtomic *'

    Which is described in this post -

    I assume the reason why this field is NText, not xml datatype in the Umbraco is that

    - It consumes extra SQL resource,

    - The code's unlikely to fire off these XML queries directly.

    - I'm unsure if it works in MySql, etc...

    Hope this helps anyone else! Any if you're a SQL whiz, let me know if it can be done any more efficiently!


Please Sign in or register to post replies

Write your reply to: