SQL reporting over Umbraco content (including Propeerty Data)
Hi,
I've been asked to help a co-worker write some reports for the content we've now migrated into Umbraco. I've done a moderate amount of work with the Umbraco API, and even Linq2SQL against the Umbraco DB (which I'm now fairly familiar with), but relatively less with SQL itself. The reports must be written independent of .NET code. The reports will mostly be fed into graphs, and the occasional CSV file.
The 2 sample queries I would be looking to write, would be:
- To pull out all published instances of a particular document type, including key Property Data fields.
- E.g., Pull out all current Product documents, including the Price, Weight and Manufacturer fields
- To summarise all published instances of the above.
- E.g. Pull out all counts of all Product documents, grouped by Manufacturer, which is sourced from a Prevalue
With multiple Property Data items being stored as rows, not columns, I understand my options would be to look into 'Cross tab' SQL scripting, or XSLT parsing in SQL? Has anyone had any experience with either?
I've previously seen some of the SQL in the source of umbraco.cms.businesslogic.web.Document, which is a little frightening, but I may need to get my head around...
Any help with getting a foothold on this would be appreciated. Thanks.
I'm confident I can pull out all cmsDocuments of a given content type - it's mostly around pulling out (just a few) of their associated property datas - the unpleasant part you mention...
This would be an offline process that's just run once a week, so it can be a little hairy, but I'm just trying to get to grips with how I'd pull out, say 2x columns of property data.
You can only do it by using sub queries, since the property data is all in the cmsPropertyData table you can't just do a SELECT (all properties) WHERE docId == 1234;
You have to have something like SELECT ( SELECT data WHERE alias == "myProperty" ) WHERE docId ==1234
Wow. So I ended up writing a few similar to those above - Mostly where I just needed to pull out one property value per document.
I also had a play with SQL pivot tables - which can work if all your data is in the same data type (e.g. dataNText, or dataDate)
The most recent discovery is XML querying in SQL!
-- Create temporary table that uses the XML data type
create table #XmlTable (
NodeId int,
NodeXml xml
)
-- Move all XML for documents of type '1121' into there
insert into #XmlTable(NodeId, NodeXml)
select cx.* from cmsContentXml cx
join cmsContent cc on cc.nodeId = cx.nodeId
where cc.contentType = 1121
select un.text [Document Name],
xt.NodeXml.query('//node/data [@alias=''fieldOne'']').value('.', 'nvarchar(max)') [Field One],
xt.NodeXml.query('//node/data [@alias=''fieldTwo'']').value('.', 'nvarchar(max)') [Field Two],
xt.NodeXml.query('//node/data [@alias=''fieldThree'']').value('.', 'nvarchar(max)') [Field Three]
from #XmlTable xt
join umbracoNode un on un.id = xt.NodeId
-- Drop temporary table when finished
drop table #XmlTable
So there you go - a slightly more sustainable way to pull query across values.
If I understood it better, I guess it could be made more efficient. I have yet to figure out if there's a way to skip the temporary table, or if I can shorten the lines that pull out values.
SQL reporting over Umbraco content (including Propeerty Data)
Hi,
I've been asked to help a co-worker write some reports for the content we've now migrated into Umbraco. I've done a moderate amount of work with the Umbraco API, and even Linq2SQL against the Umbraco DB (which I'm now fairly familiar with), but relatively less with SQL itself. The reports must be written independent of .NET code. The reports will mostly be fed into graphs, and the occasional CSV file.
The 2 sample queries I would be looking to write, would be:
- To pull out all published instances of a particular document type, including key Property Data fields.
- E.g., Pull out all current Product documents, including the Price, Weight and Manufacturer fields
- To summarise all published instances of the above.
- E.g. Pull out all counts of all Product documents, grouped by Manufacturer, which is sourced from a Prevalue
With multiple Property Data items being stored as rows, not columns, I understand my options would be to look into 'Cross tab' SQL scripting, or XSLT parsing in SQL? Has anyone had any experience with either?
I've previously seen some of the SQL in the source of umbraco.cms.businesslogic.web.Document, which is a little frightening, but I may need to get my head around...
Any help with getting a foothold on this would be appreciated. Thanks.
Trying to do this in the database itself is going to be just unpleasant, the database isn't designed to be queried like that.
cmsContentType table is where you'll find the alias for a doc type
umbracoNode will give you the Text (name) for a doc type as well as a doc in the tree
cmsPropertyType is where you can find a property alias (and name)
cmsPropertyData is the data for a property on a document, it also contains all versions
That should get you started
How are we expected to query the database without writing C# code and spinning up the entire site just to get some information out?
Sorry for my ignorance with umbraco.
Thanks Slace,
I'm confident I can pull out all cmsDocuments of a given content type - it's mostly around pulling out (just a few) of their associated property datas - the unpleasant part you mention...
This would be an offline process that's just run once a week, so it can be a little hairy, but I'm just trying to get to grips with how I'd pull out, say 2x columns of property data.
You can only do it by using sub queries, since the property data is all in the cmsPropertyData table you can't just do a SELECT (all properties) WHERE docId == 1234;
You have to have something like SELECT ( SELECT data WHERE alias == "myProperty" ) WHERE docId ==1234
It'll end up being slow and brittle too
Wow. So I ended up writing a few similar to those above - Mostly where I just needed to pull out one property value per document.
I also had a play with SQL pivot tables - which can work if all your data is in the same data type (e.g. dataNText, or dataDate)
The most recent discovery is XML querying in SQL!
So there you go - a slightly more sustainable way to pull query across values.
If I understood it better, I guess it could be made more efficient. I have yet to figure out if there's a way to skip the temporary table, or if I can shorten the lines that pull out values.
Of all the tutorials that explained XML querying, this one helped the most - http://blogs.msdn.com/b/mrorke/archive/2005/07/21/441554.aspx.
Anyone taking a punt on this may also come across the error:
Which is described in this post - http://milambda.blogspot.com/2006/11/sql-server-2005-xml-methods-part-one.html
I assume the reason why this field is NText, not xml datatype in the Umbraco is that
- It consumes extra SQL resource,
- The code's unlikely to fire off these XML queries directly.
- I'm unsure if it works in MySql, etc...
Hope this helps anyone else! Any if you're a SQL whiz, let me know if it can be done any more efficiently!
Thanks.
is working on a reply...