My client has an exisitng CMS which they use to distribute web based content. It's only redeming feature is that the client can upload content from structured word documents and it creates content pages for each of the sections in the document.
Has anyone done this in Umbraco?
I know that you can use Word to create blog posts -- that's not what I want to do.
To be clear, the source documents will be structured by using heading levels in the word paragraph styles, and so Heading Level 1 types will be created as DocTypeA, while Heading Level 2 types would be created as DocTypeB and always appended below their parent. etc. etc.
I once dabbled in converting the docx format (which is xml-based) into umbraco content. The theory is easy enough but because of all the ways a doc file might be composed the implementation was going to take longer than could be justified by the project needs. In principle, not difficult... just unzip the docx file (yes, it's really just a zip file with a manifest... kinda like an umbraco package). Load that into xslt or .net or razor and parse the relevant bits of the xml. Any images in the file are stored in the zip so you can import those too if that's part of the requirement.
I'm thinking that parsing the XML might be a headache and that actually another way of doing it might be as a VBA macro running inside Word that makes calls to the Umbraco API - or am I just nuts?
Sure, that's another approach, though I'd have thought working in full c# .net would give you more tools to work with both the source document and the umbraco api than from a vba macro. But perhaps a vba macro to export a csv file that CMSimport could process for you? It'd be a 2-step process but the least work on your part.
And you could use umbraco's built-in task scheduler (see the umbracoSettings.config file) or CMSimport could run imports periodically based on files in some folder of your site... where your vba macro would drop them.
I'd love to see a general solution to this, it would be a killer feature to demo and use in real life for many clients.
Import structured word documents to Umbraco
My client has an exisitng CMS which they use to distribute web based content. It's only redeming feature is that the client can upload content from structured word documents and it creates content pages for each of the sections in the document.
Has anyone done this in Umbraco?
I know that you can use Word to create blog posts -- that's not what I want to do.
To be clear, the source documents will be structured by using heading levels in the word paragraph styles, and so Heading Level 1 types will be created as DocTypeA, while Heading Level 2 types would be created as DocTypeB and always appended below their parent. etc. etc.
Thanks guys
Hi, Paul,
I once dabbled in converting the docx format (which is xml-based) into umbraco content. The theory is easy enough but because of all the ways a doc file might be composed the implementation was going to take longer than could be justified by the project needs. In principle, not difficult... just unzip the docx file (yes, it's really just a zip file with a manifest... kinda like an umbraco package). Load that into xslt or .net or razor and parse the relevant bits of the xml. Any images in the file are stored in the zip so you can import those too if that's part of the requirement.
Let us know how you get on with it.
cheers,
doug.
I'm thinking that parsing the XML might be a headache and that actually another way of doing it might be as a VBA macro running inside Word that makes calls to the Umbraco API - or am I just nuts?
Sure, that's another approach, though I'd have thought working in full c# .net would give you more tools to work with both the source document and the umbraco api than from a vba macro. But perhaps a vba macro to export a csv file that CMSimport could process for you? It'd be a 2-step process but the least work on your part.
And you could use umbraco's built-in task scheduler (see the umbracoSettings.config file) or CMSimport could run imports periodically based on files in some folder of your site... where your vba macro would drop them.
I'd love to see a general solution to this, it would be a killer feature to demo and use in real life for many clients.
cheers,
doug.
Hey,
Just a pointer if you do want to parse the Word document, look into VSTO and the Word Content Control toolkit ( http://dbe.codeplex.com/ )
I used it in a law firm to use Umbraco as a Word Template Management System. (basically using the Umbraco doc type fields to hold template content).
It worked ok but the principles may apply to what you want to do
Jay
is working on a reply...