Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • bayshield 50 posts 65 karma points
    Dec 30, 2009 @ 15:46
    bayshield
    0

    Parsing CSV files with embedded commas

    Hi,

    Sorry this post is not directly umbraco related but this is the most helpful community I am a part of!  I have a CSV file that contains fields with embedded commas i.e.  1,4,55,"aaron,williams",1

    What is the easiest way for me to remove the comma in "aaron,williams"?  It will always be surrounded by the "" quotes.

    Thanks

  • Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib
    Dec 30, 2009 @ 16:07
    Douglas Robar
    0

    Haha... we sure try to be friendly and helpful.

    And even though this is not really an umbraco question at all...

    I'd open the CSV in Visual Studio and use a regex search-replace for commas inside double quotes.

    The regex would be something like this:
    FIND:  "{.*},{.*}"
    REPLACE: "\1 \2"

    cheers,
    doug.

     

  • Richard Soeteman 4049 posts 12922 karma points MVP 2x
    Dec 30, 2009 @ 16:39
    Richard Soeteman
    1

    Hi,

    In UmbImport I'm using this custom CSV Parser. It's taking care of these issues.

    Hope it helps you,

    Richard

  • Nik Wahlberg 639 posts 1237 karma points MVP
    Dec 30, 2009 @ 16:47
    Nik Wahlberg
    0

    Hi,

    what Doug suggest should work just fine for the one-off replace. However, if you're looking for a programatic way to parse he file at some interval, the following class might help. 

    public class StringFunctions
            {
                public static string[] Split(string expression, char delimiter, char qualifier, bool ignoreCase)
                {
                    expression = expression.Trim();
                    if (ignoreCase)
                    {
                        expression = expression.ToLower();
                        delimiter = char.ToLower(delimiter);
                        qualifier = char.ToLower(qualifier);
                    }
    
                    int len = expression.Length;
                    List<string> list = new List<string>();
                    int begField, endField; // text cursors
                    for (begField = endField = 0; endField < len; begField = endField)
                    {
                        char s = expression[endField];
                        bool entityContainsQualifiers = false;
    
                        // move to the delimiter
                        while (s != delimiter)
                        {
                            if (s != qualifier)
                            {
                                // consume and continue if possible
                                ++endField;
                                if (len <= endField) { break; }
                                else { s = expression[endField]; continue; }
                            }
    
                            #region Consume Text Within Two Qualifiers
    
                            // we have the qualifier symbol
                            // then move to the closing one
                            entityContainsQualifiers = true;
                            bool foundClosingQualifier = false;
                            for (endField = endField + 1; endField < len; ++endField)
                            {
                                s = expression[endField];
                                if (endField + 1 < len)
                                {
                                    if (s == qualifier && expression[endField + 1] == delimiter)
                                    {
                                       foundClosingQualifier = true;
                                        break;
                                   }
                                }
                                else
                                {
                                    if (s == qualifier)
                                    {
                                        foundClosingQualifier = true;
                                        break;
                                    }
                                }
                            }
    
                            if (false == foundClosingQualifier)
                            {
                                throw new ArgumentException
                                    ("expression contains an unclosed qualifier symbol");
                            }
    
                            // consume the closing quantifier and continue if possible
                            ++endField;
                            if (len <= endField) { break; }
                            else { s = expression[endField]; continue; }
                       #endregion
    
                        }//while (s != delimiter)
                        // all what is in between begField and endField cursors is the entity...
                        string entity = expression.Substring(begField, endField - begField);
                        if (entityContainsQualifiers)
                        {
                            entity = entity.Replace(new string(qualifier, 1), "");
                        }
    
                        list.Add(entity);
    
                        // two possibilities:
                        // 1) we have found the delimiter
                        // 2) we have came to the end of the expression
                        // possibility (1)
    
                        if (s == delimiter)
                        {
                            // consume and continue if possible
                            ++endField;
                            if (len <= endField)
                            {
                                // this delimiter is the last symbol of the expression
                                // we should add the empty string as the last entity
                                // and leave
                                list.Add(string.Empty);
                                break;
                            }
                            else
                            {
                                // there are more entities in the expression
                                // proceed with collecting the entities
                                // note: s - initialization is done at the begining of the main cycle
                                continue;
                            }
                        }
                        else // possibility (2)
                        {
                            // leave the cycle
                           break;
                        }
                    }
                    return list.ToArray();
                }
            }

    I didn't write this code (and unfortunately don't remember where this is from...) but I used this a while back for parsing CSVs and if I recall it worked great. 

    HTH,
    Nik

Please Sign in or register to post replies

Write your reply to:

Draft