Search code examples
pythonpython-3.xurllib

python 3: urllib, csv reader, and StringIO- why the difference?


These two approaches yield different results in python 3.7.3:

res = urllib.request.urlopen(url, timeout=timeout)
content = res.read().decode('utf-8')
reader = csv.reader(StringIO(content))
lines = list(reader)

And

res = urllib.request.urlopen(url, timeout=timeout)
content = res.read().decode('utf-8')
reader = csv.reader(content)
lines = list(reader)

The former gives me what I want, a list of the rows from the CSV, the latter gives me a list containing lists of length 1 of single characters only (each character in the text is its own list), so:

Year,PID
2019,1
2018,2

And

Y
e
a
r,
P
i
d
(etc)

What's the difference?


Solution

  • in-memory stream for text I/O

    • It's important to note that:

    For strings StringIO can be used like a file opened in text mode

    csv.reader treats StringIO(content) as open file. And reader is

    a reader object which will iterate over lines in the given csvfile

    • So lines = list(reader) will return you a list of lines in content

    In the second case content is of type string.

    • So csv.reader(content) will return an iterator over the string.

    And this is because:

    csv.reader(csvfile, dialect='excel', **fmtparams) csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called

    • That's why lines = list(reader) returns a list of characters, as it treats each character in content as a row.