I imported 3000 records of data using CMS Import Pro. It took a lot of running in small batches, but eventually it worked and all of the data is now in the right place. The original import contained a list of establishments in xlsx format, with their name, address and postcode and lots of other details. Now though, I need to plot these establishments onto a google map so I need the latitude/longitude values stored for each establishment. I've bulk converted the postcodes of the establishments into latitude and longitude fields and I need to add the values to my imported nodes. So I've created two new fields on the document type and I have a new xlsx file to import which just contains the establishment name, lat and lng fields. Just three columns. I want to use the establishment name as the mapping key to identify that the records should be updates rather than new records.
I've run the import process again using this new spreadsheet, and selecting the establishment name column as the identity for mapping, but the result is that it has just added all of this data as new records, except for one record which was updated. So I now have duplicate nodes in my content tree with exactly the same name.
Is there anything obvious that might be doing this (e.g. does the identifier column have to be a unique numeric rather than a string, for example) or any common gothas to be aware of? Am I correct in assuming that CMSImport should be able to handle this kind of update?
As a worst case scenario I could delete all of the nodes, merge the original data set spreadsheets and re-import the whole lot, but it took so long to get the original data imported I'd much prefer to do an update if possible.
As long as the keys are the same, the key columns are the same and the datasource type is the same you should be able to update those records,. CMSImport is using that to identify if the record is already imported or not. Can you have a look at the cmsimport relation table. I think something in the excel file must have caused those duplicates and you should be able to find what was causing this in that table.
Fantastic! It was a slight variation in the name of the key column between the original source and the update source that was responsible. These match now and the import has updated correctly. Perfect, thanks again.
No, not just a space. I had the original column titled 'Establishment Name' whereas the import sheet column title was just 'Name'. Since I'd mapped it correctly through the UI I didn't think it'd be an issue but looking at the DB table clearly the actual name of that column is important. Changed it and bingo!
Update creates new records rather than updating
Hi,
I imported 3000 records of data using CMS Import Pro. It took a lot of running in small batches, but eventually it worked and all of the data is now in the right place. The original import contained a list of establishments in xlsx format, with their name, address and postcode and lots of other details. Now though, I need to plot these establishments onto a google map so I need the latitude/longitude values stored for each establishment. I've bulk converted the postcodes of the establishments into latitude and longitude fields and I need to add the values to my imported nodes. So I've created two new fields on the document type and I have a new xlsx file to import which just contains the establishment name, lat and lng fields. Just three columns. I want to use the establishment name as the mapping key to identify that the records should be updates rather than new records.
I've run the import process again using this new spreadsheet, and selecting the establishment name column as the identity for mapping, but the result is that it has just added all of this data as new records, except for one record which was updated. So I now have duplicate nodes in my content tree with exactly the same name.
Is there anything obvious that might be doing this (e.g. does the identifier column have to be a unique numeric rather than a string, for example) or any common gothas to be aware of? Am I correct in assuming that CMSImport should be able to handle this kind of update?
As a worst case scenario I could delete all of the nodes, merge the original data set spreadsheets and re-import the whole lot, but it took so long to get the original data imported I'd much prefer to do an update if possible.
Thanks!
Hi Dan,
As long as the keys are the same, the key columns are the same and the datasource type is the same you should be able to update those records,. CMSImport is using that to identify if the record is already imported or not. Can you have a look at the cmsimport relation table. I think something in the excel file must have caused those duplicates and you should be able to find what was causing this in that table.
Thanks,
Richard
Okay, thanks Richard, I'll take a peek at the database and report back. Thanks for the prompt support, as ever :)
Fantastic! It was a slight variation in the name of the key column between the original source and the update source that was responsible. These match now and the import has updated correctly. Perfect, thanks again.
Hi Dan,
Great to hear that it solved the issue. It wasn't just a space was it?
Thanks,
Richard
Hi Richard,
No, not just a space. I had the original column titled 'Establishment Name' whereas the import sheet column title was just 'Name'. Since I'd mapped it correctly through the UI I didn't think it'd be an issue but looking at the DB table clearly the actual name of that column is important. Changed it and bingo!
is working on a reply...