web page content migration

Go to solution

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Faiyaz 19 posts 39 karma points

Feb 23, 2011 @ 23:32

0

Web page Content Migration

Hi

we are analyszing a content migration project. Sharepoint to Umbraco.

The content (web pages) from a sharepoint website is to be migrated on to a Umbraco website. There are 100's of pages on that share point website. An example of a page is

http://www.improvingnhsscotland.scot.nhs.uk/case-studies/Pages/Non_Arthritic_Knee_Pathway_in_Ayrshire_%20and_Arran.aspx

I would like to know thoughts on efficient way of migrating that content in a Umbraco website. Please note that the sharepoint web page content is not structured content.

I am thinking of providing a rich text box and copy pasting the content of the web page into that rich text box so that it automatically inherits some mark up like headings etc. We do have manual resource to copy paste but not entire catalouging of the whole page.

Regards

Faiyaz

Copy Link
Sascha Wolter 615 posts 1101 karma points

Feb 24, 2011 @ 00:32

0

Hi Faiyaz,

copy pasting the content into a richt text box will surely work, you might need to provide a couple of custom styles for the editor, but that's not a biggy and you would need to do that anyway I guess.

However when you talk about 100s of these 'content' pages some kind of automation would be nice. Since it's Sharepoint everything is held in a database, so should be accessible in one way or the other. E.g. you could actually build some web services in Sharepoint that expose the content you require, or directly access the Sharepoint database to get the content out yourself (haven't had a look at any SP database yet, it's probably quite messy...).

You could then write a small program that gets the data and adds the pages one by one to Umbraco, automatically filling the required properties. I recently wrote such a tool when converting a standard ASP.Net web site to Umbraco, it worked quite well. Another plus of this method is to automatically save the UrlAliases in Umbraco so your existing Urls will be automatically mapped to the new Umbraco urls.

Reading out the html of the existing pages would be an idea as well. E.g. if you know that the content always sits in <div id="main-content">...</div> you could try to extract that and put the text in new Umbraco nodes. However it's questionable if copy pasting is not more efficient.

Also have a look at the CMSImport package by Richard Soeteman, I see that working quite well with a custom DataAdapter for the Sharepoint database.

Hope that helps,

Sascha

Copy Link
Faiyaz 19 posts 39 karma points

Feb 24, 2011 @ 07:56

0

Hi Sascha

Many thanks for you reply. Yes I guess copy & paste OR some crawling are the options. Any kind of data export & import is ruled out the Sharepoint chaps are not greatly accessible. I have other questions

Storage

------------

Since the number of pages are in 100's and they grow at 10 per week I am bit reluctant to have them as Umbraco nodes. I am thinking of storing the content along with the mark up in the database. Please note we need to provide a new interface to add such content henceforth in the new system.

Search

-----------

The above content needs to available for search and browse. So the mark up stored could be problem unless we strip that out and index.

All this would be easy if the content is stored as umbraco nodes but I am bit concerned about having 100's of nodes in an umbraco website. I know I hear some umbraco websites have 1000's of nodes, but still..

I appreciate your thoughts on this?

Faiyaz

Copy Link
Sascha Wolter 615 posts 1101 karma points

Feb 25, 2011 @ 08:12

0

Hi Faiyaz,

sorry for the delay, here are my 2 cents:

Storage: I'd say the amount of data/nodes is quite alright for Umbraco to handle. Roughly 500 pages per year wouldn't worry me too much, you could also consider moving them to some kind of 'archive' folder (although keeping their url) so they get out of editor's way. One thing I would properly think through though is the arrangement of the nodes, e.g. you can order them by date or alphabetically in some way. You don't want someone opening a node and wait for 5000 sub nodes to populate. You also have to consider that a node in Umbraco is also creating the Url for the page. That means if you want Umbraco to handle the url structure you will need to have all these nodes created anyway, so you can as well save the content on them.

Saving the data in a custom database is definitely possible as well. You could not have nodes for your articles at all but just a generic page that get's the content via a query string.

The middle ground if you definitely want to save the content in a custom table could e.g. be a custom Umbraco data type which is basically a WYSIWYG editor yet gets it's content from the custom db. E.g. you have an article that has an Umbraco node with Id 23456, when you edit the page in Umbraco your custom data type will request entry Id 23456 from your custom data table and display the content. Upon saving the data will not be written to the node but instead the custom data type will save the data to the custom table. That is definitely possible, however is in my humble opinion unnecessary.

Search: For that amount of nodes I would definitely check out the Examine search engine, which works nicely with Umbraco. It's an implementation of the marvellous Lucene.Net library and will serve your requirements well. You can fine tune the fields which will be indexed, how they will be indexed, what you want to get returned etc.

Hope that helps your considerations,

Sascha

Copy Link
Richard Soeteman 4053 posts 12926 karma points MVP 2x

Feb 25, 2011 @ 09:15

0

Hi Faiyaz,

We have recently migrated a dutch site from sharepoint to Umbraco (end result in Dutch can be found at http://www.kinggemeenten.nl/gemma ). The external party delivered us an xml Dump which I've written a few CMSImport custom DataAdapters for. I don't know Sharepoint that good but what I know is everything is a List in Sharepoint so I think there are propably better options to get the data per list using a Webservice. Once I find out I'll create a custom DataAdapter for Sharepoint.

I've also seen Linq2Sharepoint and a Sharepoint connector (not cheap) to get the data from Sharepoint. Maybe you can use that to retreive the data from Sharepoint.

I would not worry to much about the amount of documents I've seen implemenetation with 10.000 + nodes performing very well. Like Sascha said it's how you structure the data. Make sure you don't have to much childnodes per node and you are fine.

Cheers,

Richard

Copy Link
Faiyaz 19 posts 39 karma points

Feb 25, 2011 @ 18:11

0

Hi Sacha and Richard

Excellent. Thanks so much for all your inputs. They cont !!!

Regards

Faitaz

Copy Link
is working on a reply...

Please Sign in or register to post replies

Flag this post as spam?