Is there a way that I can parse a single comma delimited string without using anything fancy like a csv.reader(..) ? I can use the split(',')
function but that doesn't work when a valid column value contains a comma itself. The csv library has readers for parsing CSV files which correctly handle the aforementioned special case, but I can't use those because I need to parse just a single string. However if the Python CSV allows parsing a single string itself then that's news to me.
Take a closer look at the documentation for the csv
module, which
says:
reader(...)
csv_reader = reader(iterable [, dialect='excel']
[optional keyword args])
for row in csv_reader:
process(row)
The "iterable" argument can be any object that returns a line
of input for each iteration, such as a file object or a list. The
optional "dialect" parameter is discussed below. The function
also accepts optional keyword arguments which override settings
provided by the dialect.
So if you have string:
>>> s = '"this is", "a test", "of the csv", "parser"'
And you want "an object that returns a line of input for each iteration", you can just wrap your string in a list:
>>> r = csv.reader([s])
>>> list(r)
[['this is', 'a test', 'of the csv parser']]
And that's how you parse a string with the csv
module.
@rafaelc suggests that iter(s)
might be more elegant, but unfortunately iter(s)
will return an iterator over the characters in s
. That is, given:
s = "'this is', 'a test', 'of the csv parser'"
r = csv.reader(iter(s))
for row in r:
print(row)
We would get output like:
["'"]
['t']
['h']
['i']
['s']
[' ']
['i']
['s']
["'"]
.
.
.
I don't think there's any way to create a line iterator over a single string that's going to be better than simply wrapping it in a list.
As @alexce points out in their answer, we can achieve something similar using a StringIO
object, but that requires substantially more overhead. Compare the size of s
wrapped in a list:
>>> sys.getsizeof([s])
64
>>> sys.getsizeof(io.StringIO(s))
184
(And there's the cost of importing the io
module, which requires both memory and time).