Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Paul Marden 235 posts 338 karma points MVP c-trib
    Oct 11, 2011 @ 16:25
    Paul Marden
    0

    Import structured word documents to Umbraco

    My client has an exisitng CMS which they use to distribute web based content.  It's only redeming feature is that the client can upload content from structured word documents and it creates content pages for each of the sections in the document.

    Has anyone done this in Umbraco?  

    I know that you can use Word to create blog posts -- that's not what I want to do.

    To be clear, the source documents will be structured by using heading levels in the word paragraph styles, and so Heading Level 1 types will be created as DocTypeA, while Heading Level 2 types would be created as DocTypeB and always appended below their parent.  etc. etc. 

    Thanks guys

  • Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib
    Oct 11, 2011 @ 18:43
    Douglas Robar
    0

    Hi, Paul,

    I once dabbled in converting the docx format (which is xml-based) into umbraco content. The theory is easy enough but because of all the ways a doc file might be composed the implementation was going to take longer than could be justified by the project needs. In principle, not difficult... just unzip the docx file (yes, it's really just a zip file with a manifest... kinda like an umbraco package). Load that into xslt or .net or razor and parse the relevant bits of the xml. Any images in the file are stored in the zip so you can import those too if that's part of the requirement. 

    Let us know how you get on with it.

    cheers,
    doug. 

  • Paul Marden 235 posts 338 karma points MVP c-trib
    Oct 11, 2011 @ 18:48
    Paul Marden
    0

    I'm thinking that parsing the XML might be a headache and that actually another way of doing it might be as a VBA macro running inside Word that makes calls to the Umbraco API - or am I just nuts?

  • Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib
    Oct 11, 2011 @ 19:04
    Douglas Robar
    0

    Sure, that's another approach, though I'd have thought working in full c# .net would give you more tools to work with both the source document and the umbraco api than from a vba macro. But perhaps a vba macro to export a csv file that CMSimport could process for you? It'd be a 2-step process but the least work on your part.

    And you could use umbraco's built-in task scheduler (see the umbracoSettings.config file) or CMSimport could run imports periodically based on files in some folder of your site... where your vba macro would drop them.

    I'd love to see a general solution to this, it would be a killer feature to demo and use in real life for many clients.

    cheers,
    doug. 

  • jaygreasley 416 posts 403 karma points
    Oct 11, 2011 @ 20:50
    jaygreasley
    0

    Hey,

    Just a pointer if you do want to parse the Word document, look into VSTO and the Word Content Control toolkit ( http://dbe.codeplex.com/ )

    I used it in a law firm to use Umbraco as a Word Template Management System. (basically using the Umbraco doc type fields to hold template content).

    It worked ok but the principles may apply to what you want to do

    Jay

     

Please Sign in or register to post replies

Write your reply to:

Draft