retrieve and display static html in body area

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Connie DeCinko 931 posts 1160 karma points

Feb 09, 2011 @ 19:33

0

Retrieve and display static HTML in body area?

I've got several years worth of archives of an HTML email newsletter that I need to include in our new Umbraco web site. I can manually paste each issue into a rich text editor body text field, but it's very time consuming and error prone.

Does anyone know of a way to automatically/dynamically suck that content into an Umbraco page? Or at least index it and include it in my site search and then generate a link list to the issues?

Copy Link
Rich Green 2246 posts 4008 karma points

Feb 09, 2011 @ 19:47

0

Hey Connie,

Are the newsletters in a consitent format so you can extract data from them?

Aparently SQL Server DTS Import is capable of reading and parsing html/web pages so you could try that or something similar?

If you can get it into any database format then you could easily import the content into Umbraco using CMS Import http://our.umbraco.org/projects/developer-tools/cmsimport

Rich

Copy Link
Connie DeCinko 931 posts 1160 karma points

Feb 09, 2011 @ 20:01

0

They go all the way back to 2002 so they start with iffy structure and correct format, going to current day with clean format. Is there an HTML to XML converter that I can try online without having to install one?

Copy Link
Mike Chambers 636 posts 1253 karma points c-trib

Feb 10, 2011 @ 00:02

0

Not used it myself....

http://blackpoint.dk/umbraco-workbench/packages/tidyhtml.aspx?p=0

think the source is available if you needed to update to the be 4.5 schema compliant.

Copy Link
Connie DeCinko 931 posts 1160 karma points

Feb 10, 2011 @ 16:15

0

Interesting package but it looks like a way to get data when the source does not provide a webservice or RSS feed. For my use, I want to keep all the formatting.

Copy Link
Richard Soeteman 4054 posts 12927 karma points MVP 2x

Feb 10, 2011 @ 20:28

0

Hi,

How are the emails stored as mentioned before you can use CMSImport for this. Depending on the format it's stored now its just a matter of stepping through a wizard and select a few items in a pulldown menu.

Cheers,

Richard

Copy Link
Connie DeCinko 931 posts 1160 karma points

Feb 10, 2011 @ 21:58

0

The content looks like:

http://www.myazbar.org/eLegal/archives/020611/home.cfm

http://www.myazbar.org/eLegal/archives/060808/home.cfm

Copy Link
Richard Soeteman 4054 posts 12927 karma points MVP 2x

Feb 11, 2011 @ 06:07
0
What you could do is wrap the whole html of the page inside an xml file, see example below. Then you can map against the properties. Another option is to export the files to html and store them in a seperate folder in your umbraco install. Then you don't have to include them but you can reference them from the site and will be included in search results.

Cheers,

Richard
```
<?  xml version="1.0" encoding="iso-8859-1" ?>
<
  
xml>
<
  
Page>
<
  
pageTitle>Your title here</pageTitle>
<
  
bodyText>
<![CDATA[
  
Your HTML goed here]]>
</
  
bodyText>
</
  
Page>
</
  
xml>
 
```
Copy Link
Connie DeCinko 931 posts 1160 karma points

Feb 11, 2011 @ 15:40

0

I was reading about CDATA yesterday and am intrigued. I wondered if there could be a CDATA data type in Umbraco for places where it would be helpful for users to copy and paste html data into an Umbraco page. I tried pasting the HTML into a RTE field but in some cases it causes a YSOD.

Copy Link
Richard Soeteman 4054 posts 12927 karma points MVP 2x

Feb 11, 2011 @ 15:54

0

Think you can do that with a Texbox multiple datatype, just past the html in that field and see what happens

Copy Link
Connie DeCinko 931 posts 1160 karma points

Feb 11, 2011 @ 18:16

0

That actually works well. But, alas, I think I am going to give up as the embeded style sheet in each newsletter wants to throw off the formatting of the containing page. In the end, I'm better off leaving each newsletter as a stand alone container, warts and all, and just iFrame it in my Umbraco site.

Now, I'd love to find a tool that would crawl the newsletters and save them as static HTML as I discovered they are still in ColdFusion. Otherwise I have to manually save about 500 of them.

Copy Link
Profiterole 232 posts 264 karma points

Feb 12, 2011 @ 02:25

0

I'm not sure if I understand you well, but you could use http://www.httrack.com/page/1/en/index.html to save your page into static html and import them after with CMSImport.

Copy Link
Richard Soeteman 4054 posts 12927 karma points MVP 2x

Feb 12, 2011 @ 07:08

0

Yep HTTrack is the way to download teh files. but then you still need to wrap it in an xml document as I mentioned before. wouldn't be to hard, just loop all the files Maybe extract some data such as pagetitle etc using HTMLAgility pack and then import it in Umbraco using CMSImport.

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Retrieve and display static HTML in body area?