parsing csv files with embedded commas

bayshield 50 posts 65 karma points

Dec 30, 2009 @ 15:46

Parsing CSV files with embedded commas

Hi,

Sorry this post is not directly umbraco related but this is the most helpful community I am a part of! I have a CSV file that contains fields with embedded commas i.e. 1,4,55,"aaron,williams",1

What is the easiest way for me to remove the comma in "aaron,williams"? It will always be surrounded by the "" quotes.

Thanks

Copy Link

Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib

Dec 30, 2009 @ 16:07

Haha... we sure try to be friendly and helpful.

And even though this is not really an umbraco question at all...

I'd open the CSV in Visual Studio and use a regex search-replace for commas inside double quotes.

The regex would be something like this:
FIND: "{.*},{.*}"
REPLACE: "\1 \2"

cheers,
doug.

Copy Link

Richard Soeteman 4054 posts 12927 karma points MVP 2x

Dec 30, 2009 @ 16:39

Hi,

In UmbImport I'm using this custom CSV Parser. It's taking care of these issues.

Hope it helps you,

Richard

Copy Link

Nik Wahlberg 639 posts 1237 karma points MVP

Dec 30, 2009 @ 16:47

Hi,

what Doug suggest should work just fine for the one-off replace. However, if you're looking for a programatic way to parse he file at some interval, the following class might help.

public class StringFunctions
        {
            public static string[] Split(string expression, char delimiter, char qualifier, bool ignoreCase)
            {
                expression = expression.Trim();
                if (ignoreCase)
                {
                    expression = expression.ToLower();
                    delimiter = char.ToLower(delimiter);
                    qualifier = char.ToLower(qualifier);
                }

                int len = expression.Length;
                List<string> list = new List<string>();
                int begField, endField; // text cursors
                for (begField = endField = 0; endField < len; begField = endField)
                {
                    char s = expression[endField];
                    bool entityContainsQualifiers = false;

                    // move to the delimiter
                    while (s != delimiter)
                    {
                        if (s != qualifier)
                        {
                            // consume and continue if possible
                            ++endField;
                            if (len <= endField) { break; }
                            else { s = expression[endField]; continue; }
                        }

                        #region Consume Text Within Two Qualifiers

                        // we have the qualifier symbol
                        // then move to the closing one
                        entityContainsQualifiers = true;
                        bool foundClosingQualifier = false;
                        for (endField = endField + 1; endField < len; ++endField)
                        {
                            s = expression[endField];
                            if (endField + 1 < len)
                            {
                                if (s == qualifier && expression[endField + 1] == delimiter)
                                {
                                   foundClosingQualifier = true;
                                    break;
                               }
                            }
                            else
                            {
                                if (s == qualifier)
                                {
                                    foundClosingQualifier = true;
                                    break;
                                }
                            }
                        }

                        if (false == foundClosingQualifier)
                        {
                            throw new ArgumentException
                                ("expression contains an unclosed qualifier symbol");
                        }

                        // consume the closing quantifier and continue if possible
                        ++endField;
                        if (len <= endField) { break; }
                        else { s = expression[endField]; continue; }
                   #endregion

                    }//while (s != delimiter)
                    // all what is in between begField and endField cursors is the entity...
                    string entity = expression.Substring(begField, endField - begField);
                    if (entityContainsQualifiers)
                    {
                        entity = entity.Replace(new string(qualifier, 1), "");
                    }

                    list.Add(entity);

                    // two possibilities:
                    // 1) we have found the delimiter
                    // 2) we have came to the end of the expression
                    // possibility (1)

                    if (s == delimiter)
                    {
                        // consume and continue if possible
                        ++endField;
                        if (len <= endField)
                        {
                            // this delimiter is the last symbol of the expression
                            // we should add the empty string as the last entity
                            // and leave
                            list.Add(string.Empty);
                            break;
                        }
                        else
                        {
                            // there are more entities in the expression
                            // proceed with collecting the entities
                            // note: s - initialization is done at the begining of the main cycle
                            continue;
                        }
                    }
                    else // possibility (2)
                    {
                        // leave the cycle
                       break;
                    }
                }
                return list.ToArray();
            }
        }

I didn't write this code (and unfortunately don't remember where this is from...) but I used this a while back for parsing CSVs and if I recall it worked great.

HTH,
Nik

Copy Link

is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Flag this post as spam?

Parsing CSV files with embedded commas