Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Connie DeCinko 931 posts 1160 karma points
    Feb 09, 2011 @ 19:33
    Connie DeCinko
    0

    Retrieve and display static HTML in body area?

    I've got several years worth of archives of an HTML email newsletter that I need to include in our new Umbraco web site.  I can manually paste each issue into a rich text editor body text field, but it's very time consuming and error prone.

    Does anyone know of a way to automatically/dynamically suck that content into an Umbraco page?  Or at least index it and include it in my site search and then generate a link list to the issues?

     

  • Rich Green 2246 posts 4008 karma points
    Feb 09, 2011 @ 19:47
    Rich Green
    0

    Hey Connie,

    Are the newsletters in a consitent format so you can extract data from them?

    Aparently SQL Server DTS Import is capable of reading and parsing html/web pages so you could try that or something similar?

    If you can get it into any database format then you could easily import the content into Umbraco using CMS Import http://our.umbraco.org/projects/developer-tools/cmsimport

    Rich

  • Connie DeCinko 931 posts 1160 karma points
    Feb 09, 2011 @ 20:01
    Connie DeCinko
    0

    They go all the way back to 2002 so they start with iffy structure and correct format, going to current day with clean format.  Is there an HTML to XML converter that I can try online without having to install one?

     

  • Mike Chambers 636 posts 1253 karma points c-trib
    Feb 10, 2011 @ 00:02
    Mike Chambers
    0

    Not used it myself....

    http://blackpoint.dk/umbraco-workbench/packages/tidyhtml.aspx?p=0

    think the source is available if you needed to update to the be 4.5 schema compliant.

  • Connie DeCinko 931 posts 1160 karma points
    Feb 10, 2011 @ 16:15
    Connie DeCinko
    0

    Interesting package but it looks like a way to get data when the source does not provide a webservice or RSS feed.  For my use, I want to keep all the formatting.

  • Richard Soeteman 4054 posts 12927 karma points MVP 2x
    Feb 10, 2011 @ 20:28
    Richard Soeteman
    0

    Hi,

    How are the emails stored as mentioned before you can use CMSImport for this. Depending on the format it's stored now its just a matter of stepping through a wizard and select a few items in a pulldown menu.

    Cheers,

    Richard

     

  • Connie DeCinko 931 posts 1160 karma points
    Feb 10, 2011 @ 21:58
  • Richard Soeteman 4054 posts 12927 karma points MVP 2x
    Feb 11, 2011 @ 06:07
    Richard Soeteman
    0

    What you could do is wrap the whole html of the page inside an xml file, see example below. Then you can map against the properties. Another option is to export the files to html and store them in a seperate folder in your umbraco install. Then you don't have to include them but you can reference them from the site and will be included in search results.

    Cheers,

    Richard

    <?

     

     

    xml version="1.0" encoding="iso-8859-1" ?>

    <

     

     

    xml>

    <

     

     

    Page>

    <

     

     

    pageTitle>Your title here</pageTitle>

    <

     

     

    bodyText>

    <![CDATA[

     

     

    Your HTML goed here]]>

    </

     

     

    bodyText>

    </

     

     

    Page>

    </

     

     

    xml>

     
  • Connie DeCinko 931 posts 1160 karma points
    Feb 11, 2011 @ 15:40
    Connie DeCinko
    0

    I was reading about CDATA yesterday and am intrigued.  I wondered if there could be a CDATA data type in Umbraco for places where it would be helpful for users to copy and paste html data into an Umbraco page.  I tried pasting the HTML into a RTE field but in some cases it causes a YSOD.

     

  • Richard Soeteman 4054 posts 12927 karma points MVP 2x
    Feb 11, 2011 @ 15:54
    Richard Soeteman
    0

    Think you can do that with a Texbox multiple datatype, just past the html in that field and see what happens

  • Connie DeCinko 931 posts 1160 karma points
    Feb 11, 2011 @ 18:16
    Connie DeCinko
    0

    That actually works well.  But, alas, I think I am going to give up as the embeded style sheet in each newsletter wants to throw off the formatting of the containing page.  In the end, I'm better off leaving each newsletter as a stand alone container, warts and all, and just iFrame it in my Umbraco site.

    Now, I'd love to find a tool that would crawl the newsletters and save them as static HTML as I discovered they are still in ColdFusion.  Otherwise I have to manually save about 500 of them.

     

  • Profiterole 232 posts 264 karma points
    Feb 12, 2011 @ 02:25
    Profiterole
    0

    I'm not sure if I understand you well, but you could use http://www.httrack.com/page/1/en/index.html to save your page into static html and import them after with CMSImport.

  • Richard Soeteman 4054 posts 12927 karma points MVP 2x
    Feb 12, 2011 @ 07:08
    Richard Soeteman
    0

    Yep HTTrack is the way to download teh files. but then you still need to wrap it in an xml document as I mentioned before. wouldn't be to hard, just loop all the files Maybe extract some data such as pagetitle etc using HTMLAgility pack and then import it in Umbraco using CMSImport.

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies