Search code examples
pythonpandasftpftplibbytesio

Reading file from a ZIP archive on FTP server without downloading to local system


My target file on the FTP server is a ZIP file, and the .CSV is located two folders further in.

How would I be able to use BytesIO to allow pandas to read the csv without downloading it?

This is what I have so far:

ftp = FTP('FTP_SERVER')
ftp.login('USERNAME', 'PASSWORD')
flo = BytesIO()
ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
flo.seek(0)

With flo as my BytesIO object of interest, how would I be able to navigate a few folders down within the object, to allow pandas to read my .csv file? Is this even necessary?


Solution

  • The zipfile module accepts file-like objects for both the archive and the individual files, so you can extract the csv file without writing the archive to the disk. And as read_csv also accepts a file-like object, all should work fine (provided you have enough available memory):

    ...
    flo = BytesIO()
    ftp.retrbinary('RETR /ParentZipFolder.zip', flo.write)
    flo.seek(0)
    with ZipFile(flo) as archive:
        with archive.open('foo/fee/bar.csv') as fd:
            df = pd.read_csv(fd)  # add relevant options here include encoding it is matters