I've got several years worth of archives of an HTML email newsletter that I need to include in our new Umbraco web site. I can manually paste each issue into a rich text editor body text field, but it's very time consuming and error prone.
Does anyone know of a way to automatically/dynamically suck that content into an Umbraco page? Or at least index it and include it in my site search and then generate a link list to the issues?
They go all the way back to 2002 so they start with iffy structure and correct format, going to current day with clean format. Is there an HTML to XML converter that I can try online without having to install one?
Interesting package but it looks like a way to get data when the source does not provide a webservice or RSS feed. For my use, I want to keep all the formatting.
How are the emails stored as mentioned before you can use CMSImport for this. Depending on the format it's stored now its just a matter of stepping through a wizard and select a few items in a pulldown menu.
What you could do is wrap the whole html of the page inside an xml file, see example below. Then you can map against the properties. Another option is to export the files to html and store them in a seperate folder in your umbraco install. Then you don't have to include them but you can reference them from the site and will be included in search results.
I was reading about CDATA yesterday and am intrigued. I wondered if there could be a CDATA data type in Umbraco for places where it would be helpful for users to copy and paste html data into an Umbraco page. I tried pasting the HTML into a RTE field but in some cases it causes a YSOD.
That actually works well. But, alas, I think I am going to give up as the embeded style sheet in each newsletter wants to throw off the formatting of the containing page. In the end, I'm better off leaving each newsletter as a stand alone container, warts and all, and just iFrame it in my Umbraco site.
Now, I'd love to find a tool that would crawl the newsletters and save them as static HTML as I discovered they are still in ColdFusion. Otherwise I have to manually save about 500 of them.
I'm not sure if I understand you well, but you could use http://www.httrack.com/page/1/en/index.html to save your page into static html and import them after with CMSImport.
Yep HTTrack is the way to download teh files. but then you still need to wrap it in an xml document as I mentioned before. wouldn't be to hard, just loop all the files Maybe extract some data such as pagetitle etc using HTMLAgility pack and then import it in Umbraco using CMSImport.
Retrieve and display static HTML in body area?
I've got several years worth of archives of an HTML email newsletter that I need to include in our new Umbraco web site. I can manually paste each issue into a rich text editor body text field, but it's very time consuming and error prone.
Does anyone know of a way to automatically/dynamically suck that content into an Umbraco page? Or at least index it and include it in my site search and then generate a link list to the issues?
Hey Connie,
Are the newsletters in a consitent format so you can extract data from them?
Aparently SQL Server DTS Import is capable of reading and parsing html/web pages so you could try that or something similar?
If you can get it into any database format then you could easily import the content into Umbraco using CMS Import http://our.umbraco.org/projects/developer-tools/cmsimport
Rich
They go all the way back to 2002 so they start with iffy structure and correct format, going to current day with clean format. Is there an HTML to XML converter that I can try online without having to install one?
Not used it myself....
http://blackpoint.dk/umbraco-workbench/packages/tidyhtml.aspx?p=0
think the source is available if you needed to update to the be 4.5 schema compliant.
Interesting package but it looks like a way to get data when the source does not provide a webservice or RSS feed. For my use, I want to keep all the formatting.
Hi,
How are the emails stored as mentioned before you can use CMSImport for this. Depending on the format it's stored now its just a matter of stepping through a wizard and select a few items in a pulldown menu.
Cheers,
Richard
The content looks like:
http://www.myazbar.org/eLegal/archives/020611/home.cfm
http://www.myazbar.org/eLegal/archives/060808/home.cfm
What you could do is wrap the whole html of the page inside an xml file, see example below. Then you can map against the properties. Another option is to export the files to html and store them in a seperate folder in your umbraco install. Then you don't have to include them but you can reference them from the site and will be included in search results.
Cheers,
Richard
I was reading about CDATA yesterday and am intrigued. I wondered if there could be a CDATA data type in Umbraco for places where it would be helpful for users to copy and paste html data into an Umbraco page. I tried pasting the HTML into a RTE field but in some cases it causes a YSOD.
Think you can do that with a Texbox multiple datatype, just past the html in that field and see what happens
That actually works well. But, alas, I think I am going to give up as the embeded style sheet in each newsletter wants to throw off the formatting of the containing page. In the end, I'm better off leaving each newsletter as a stand alone container, warts and all, and just iFrame it in my Umbraco site.
Now, I'd love to find a tool that would crawl the newsletters and save them as static HTML as I discovered they are still in ColdFusion. Otherwise I have to manually save about 500 of them.
I'm not sure if I understand you well, but you could use http://www.httrack.com/page/1/en/index.html to save your page into static html and import them after with CMSImport.
Yep HTTrack is the way to download teh files. but then you still need to wrap it in an xml document as I mentioned before. wouldn't be to hard, just loop all the files Maybe extract some data such as pagetitle etc using HTMLAgility pack and then import it in Umbraco using CMSImport.
is working on a reply...
This forum is in read-only mode while we transition to the new forum.
You can continue this topic on the new forum by tapping the "Continue discussion" link below.