we are analyszing a content migration project. Sharepoint to Umbraco.
The content (web pages) from a sharepoint website is to be migrated on to a Umbraco website. There are 100's of pages on that share point website. An example of a page is
I would like to know thoughts on efficient way of migrating that content in a Umbraco website. Please note that the sharepoint web page content is not structured content.
I am thinking of providing a rich text box and copy pasting the content of the web page into that rich text box so that it automatically inherits some mark up like headings etc. We do have manual resource to copy paste but not entire catalouging of the whole page.
copy pasting the content into a richt text box will surely work, you might need to provide a couple of custom styles for the editor, but that's not a biggy and you would need to do that anyway I guess.
However when you talk about 100s of these 'content' pages some kind of automation would be nice. Since it's Sharepoint everything is held in a database, so should be accessible in one way or the other. E.g. you could actually build some web services in Sharepoint that expose the content you require, or directly access the Sharepoint database to get the content out yourself (haven't had a look at any SP database yet, it's probably quite messy...).
You could then write a small program that gets the data and adds the pages one by one to Umbraco, automatically filling the required properties. I recently wrote such a tool when converting a standard ASP.Net web site to Umbraco, it worked quite well. Another plus of this method is to automatically save the UrlAliases in Umbraco so your existing Urls will be automatically mapped to the new Umbraco urls.
Reading out the html of the existing pages would be an idea as well. E.g. if you know that the content always sits in <div id="main-content">...</div> you could try to extract that and put the text in new Umbraco nodes. However it's questionable if copy pasting is not more efficient.
Also have a look at the CMSImport package by Richard Soeteman, I see that working quite well with a custom DataAdapter for the Sharepoint database.
Many thanks for you reply. Yes I guess copy & paste OR some crawling are the options. Any kind of data export & import is ruled out the Sharepoint chaps are not greatly accessible. I have other questions
Storage
------------
Since the number of pages are in 100's and they grow at 10 per week I am bit reluctant to have them as Umbraco nodes. I am thinking of storing the content along with the mark up in the database. Please note we need to provide a new interface to add such content henceforth in the new system.
Search
-----------
The above content needs to available for search and browse. So the mark up stored could be problem unless we strip that out and index.
All this would be easy if the content is stored as umbraco nodes but I am bit concerned about having 100's of nodes in an umbraco website. I know I hear some umbraco websites have 1000's of nodes, but still..
Storage: I'd say the amount of data/nodes is quite alright for Umbraco to handle. Roughly 500 pages per year wouldn't worry me too much, you could also consider moving them to some kind of 'archive' folder (although keeping their url) so they get out of editor's way. One thing I would properly think through though is the arrangement of the nodes, e.g. you can order them by date or alphabetically in some way. You don't want someone opening a node and wait for 5000 sub nodes to populate. You also have to consider that a node in Umbraco is also creating the Url for the page. That means if you want Umbraco to handle the url structure you will need to have all these nodes created anyway, so you can as well save the content on them.
Saving the data in a custom database is definitely possible as well. You could not have nodes for your articles at all but just a generic page that get's the content via a query string.
The middle ground if you definitely want to save the content in a custom table could e.g. be a custom Umbraco data type which is basically a WYSIWYG editor yet gets it's content from the custom db. E.g. you have an article that has an Umbraco node with Id 23456, when you edit the page in Umbraco your custom data type will request entry Id 23456 from your custom data table and display the content. Upon saving the data will not be written to the node but instead the custom data type will save the data to the custom table. That is definitely possible, however is in my humble opinion unnecessary.
Search: For that amount of nodes I would definitely check out the Examine search engine, which works nicely with Umbraco. It's an implementation of the marvellous Lucene.Net library and will serve your requirements well. You can fine tune the fields which will be indexed, how they will be indexed, what you want to get returned etc.
We have recently migrated a dutch site from sharepoint to Umbraco (end result in Dutch can be found at http://www.kinggemeenten.nl/gemma ). The external party delivered us an xml Dump which I've written a few CMSImport custom DataAdapters for. I don't know Sharepoint that good but what I know is everything is a List in Sharepoint so I think there are propably better options to get the data per list using a Webservice. Once I find out I'll create a custom DataAdapter for Sharepoint.
I've also seen Linq2Sharepoint and a Sharepoint connector (not cheap) to get the data from Sharepoint. Maybe you can use that to retreive the data from Sharepoint.
I would not worry to much about the amount of documents I've seen implemenetation with 10.000 + nodes performing very well. Like Sascha said it's how you structure the data. Make sure you don't have to much childnodes per node and you are fine.
Web page Content Migration
Hi
we are analyszing a content migration project. Sharepoint to Umbraco.
The content (web pages) from a sharepoint website is to be migrated on to a Umbraco website. There are 100's of pages on that share point website. An example of a page is
http://www.improvingnhsscotland.scot.nhs.uk/case-studies/Pages/Non_Arthritic_Knee_Pathway_in_Ayrshire_%20and_Arran.aspx
I would like to know thoughts on efficient way of migrating that content in a Umbraco website. Please note that the sharepoint web page content is not structured content.
I am thinking of providing a rich text box and copy pasting the content of the web page into that rich text box so that it automatically inherits some mark up like headings etc. We do have manual resource to copy paste but not entire catalouging of the whole page.
Regards
Faiyaz
Hi Faiyaz,
copy pasting the content into a richt text box will surely work, you might need to provide a couple of custom styles for the editor, but that's not a biggy and you would need to do that anyway I guess.
However when you talk about 100s of these 'content' pages some kind of automation would be nice. Since it's Sharepoint everything is held in a database, so should be accessible in one way or the other. E.g. you could actually build some web services in Sharepoint that expose the content you require, or directly access the Sharepoint database to get the content out yourself (haven't had a look at any SP database yet, it's probably quite messy...).
You could then write a small program that gets the data and adds the pages one by one to Umbraco, automatically filling the required properties. I recently wrote such a tool when converting a standard ASP.Net web site to Umbraco, it worked quite well. Another plus of this method is to automatically save the UrlAliases in Umbraco so your existing Urls will be automatically mapped to the new Umbraco urls.
Reading out the html of the existing pages would be an idea as well. E.g. if you know that the content always sits in <div id="main-content">...</div> you could try to extract that and put the text in new Umbraco nodes. However it's questionable if copy pasting is not more efficient.
Also have a look at the CMSImport package by Richard Soeteman, I see that working quite well with a custom DataAdapter for the Sharepoint database.
Hope that helps,
Sascha
Hi Sascha
Many thanks for you reply. Yes I guess copy & paste OR some crawling are the options. Any kind of data export & import is ruled out the Sharepoint chaps are not greatly accessible. I have other questions
Storage
------------
Since the number of pages are in 100's and they grow at 10 per week I am bit reluctant to have them as Umbraco nodes. I am thinking of storing the content along with the mark up in the database. Please note we need to provide a new interface to add such content henceforth in the new system.
Search
-----------
The above content needs to available for search and browse. So the mark up stored could be problem unless we strip that out and index.
All this would be easy if the content is stored as umbraco nodes but I am bit concerned about having 100's of nodes in an umbraco website. I know I hear some umbraco websites have 1000's of nodes, but still..
I appreciate your thoughts on this?
Faiyaz
Hi Faiyaz,
sorry for the delay, here are my 2 cents:
Storage: I'd say the amount of data/nodes is quite alright for Umbraco to handle. Roughly 500 pages per year wouldn't worry me too much, you could also consider moving them to some kind of 'archive' folder (although keeping their url) so they get out of editor's way. One thing I would properly think through though is the arrangement of the nodes, e.g. you can order them by date or alphabetically in some way. You don't want someone opening a node and wait for 5000 sub nodes to populate. You also have to consider that a node in Umbraco is also creating the Url for the page. That means if you want Umbraco to handle the url structure you will need to have all these nodes created anyway, so you can as well save the content on them.
Saving the data in a custom database is definitely possible as well. You could not have nodes for your articles at all but just a generic page that get's the content via a query string.
The middle ground if you definitely want to save the content in a custom table could e.g. be a custom Umbraco data type which is basically a WYSIWYG editor yet gets it's content from the custom db. E.g. you have an article that has an Umbraco node with Id 23456, when you edit the page in Umbraco your custom data type will request entry Id 23456 from your custom data table and display the content. Upon saving the data will not be written to the node but instead the custom data type will save the data to the custom table. That is definitely possible, however is in my humble opinion unnecessary.
Search: For that amount of nodes I would definitely check out the Examine search engine, which works nicely with Umbraco. It's an implementation of the marvellous Lucene.Net library and will serve your requirements well. You can fine tune the fields which will be indexed, how they will be indexed, what you want to get returned etc.
Hope that helps your considerations,
Sascha
Hi Faiyaz,
We have recently migrated a dutch site from sharepoint to Umbraco (end result in Dutch can be found at http://www.kinggemeenten.nl/gemma ). The external party delivered us an xml Dump which I've written a few CMSImport custom DataAdapters for. I don't know Sharepoint that good but what I know is everything is a List in Sharepoint so I think there are propably better options to get the data per list using a Webservice. Once I find out I'll create a custom DataAdapter for Sharepoint.
I've also seen Linq2Sharepoint and a Sharepoint connector (not cheap) to get the data from Sharepoint. Maybe you can use that to retreive the data from Sharepoint.
I would not worry to much about the amount of documents I've seen implemenetation with 10.000 + nodes performing very well. Like Sascha said it's how you structure the data. Make sure you don't have to much childnodes per node and you are fine.
Cheers,
Richard
Hi Sacha and Richard
Excellent. Thanks so much for all your inputs. They cont !!!
Regards
Faitaz
is working on a reply...