Search code examples
pythonpython-2.7parsingcsv

Parse a single CSV string?


Is there a way that I can parse a single comma delimited string without using anything fancy like a csv.reader(..) ? I can use the split(',') function but that doesn't work when a valid column value contains a comma itself. The csv library has readers for parsing CSV files which correctly handle the aforementioned special case, but I can't use those because I need to parse just a single string. However if the Python CSV allows parsing a single string itself then that's news to me.


Solution

  • Take a closer look at the documentation for the csv module, which says:

    reader(...)
        csv_reader = reader(iterable [, dialect='excel']
                                [optional keyword args])
            for row in csv_reader:
                process(row)
    
        The "iterable" argument can be any object that returns a line
        of input for each iteration, such as a file object or a list.  The
        optional "dialect" parameter is discussed below.  The function
        also accepts optional keyword arguments which override settings
        provided by the dialect.
    

    So if you have string:

    >>> s = '"this is", "a test", "of the csv", "parser"'
    

    And you want "an object that returns a line of input for each iteration", you can just wrap your string in a list:

    >>> r = csv.reader([s])
    >>> list(r)
    [['this is', 'a test', 'of the csv parser']]
    

    And that's how you parse a string with the csv module.


    @rafaelc suggests that iter(s) might be more elegant, but unfortunately iter(s) will return an iterator over the characters in s. That is, given:

    s = "'this is', 'a test', 'of the csv parser'"
    r = csv.reader(iter(s))
    for row in r:
      print(row)
    

    We would get output like:

    ["'"]
    ['t']
    ['h']
    ['i']
    ['s']
    [' ']
    ['i']
    ['s']
    ["'"]
    .
    .
    .
    

    I don't think there's any way to create a line iterator over a single string that's going to be better than simply wrapping it in a list.

    As @alexce points out in their answer, we can achieve something similar using a StringIO object, but that requires substantially more overhead. Compare the size of s wrapped in a list:

    >>> sys.getsizeof([s])
    64
    >>> sys.getsizeof(io.StringIO(s))
    184
    

    (And there's the cost of importing the io module, which requires both memory and time).