Search code examples
pythoncsvunzip

convert csv data to dict without writing file to disk


Here is my scenario: I have a zip file that I am downloading with requests into memory rather than writing a file. I am unzipping the data into an object called myzipfile. Inside the zip file is a csv file. I would like to convert each row of the csv data into a dictionary. Here is what I have so far.

import csv
from io import BytesIO
import requests

# other imports etc. 

        r = requests.get(url=fileurl, headers=headers, stream=True)
        filebytes = BytesIO(r.content)

        myzipfile = zipfile.ZipFile(filebytes)
        for name in myzipfile.namelist():  
            mycsv = myzipfile.open(name).read()
            for row in csv.DictReader(mycsv):  # it fails here.
                print(row)

errors:

Traceback (most recent call last):
  File "/usr/lib64/python3.7/csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not int (did you open the file in text mode?)

Looks like csv.DictReader(mycsv) expects a file object instead of raw data. How do I convert the rows in the mycsv object data (<class 'bytes'>) to a list of dictionaries? I'm trying to accomplish this without writing a file to disk and working directly from csv objects in memory.


Solution

  • The DictReader expects a file or file-like object: we can satisfy this expectation by loading the zipped file into an io.StringIO instance.

    Note that StringIO expects its argument to be a str, but reading a file from the zipfile returns bytes, so the data must be decoded. This example assumes that the csv was originally encoded with the local system's default encoding. If that is not the case the correct encoding must be passed to decode().

    for name in myzipfile.namelist():
        data = myzipfile.open(name).read().decode()
        mycsv = io.StringIO(data)
        reader = csv.DictReader(mycsv)
        for row in reader:
            print(row)