Search code examples
pythonioftplibpython-zipfile

How to read a csvfile on FTP that is compressed on a zip/folder


I'm trying to :

  1. read a .csv file (compressed in a zipfile that is stored on FTP) by using ftplib
  2. store the .csv file on a virtual file on memory by using io
  3. transform the virutal file to a dataframe by using pandas

enter image description here

For that I'm using the code below and it works really fine for the first scenario (path1, see image above) :

CODE :

import ftplib
import zipfile
import io
import pandas as pd

ftp = ftplib.FTP("theserver_name")
ftp.login("my_username","my_password")
ftp.encoding = "utf-8"

ftp.cwd('folder1/folder2')
filename = 'zipFile1.zip'

download_file = io.BytesIO()
ftp.retrbinary("RETR " + filename, download_file.write)
download_file.seek(0)
zfile = zipfile.ZipFile(download_file)

df = pd.read_csv(zfile.namelist()[0], delimiter=';')

display(df)

But in the second scenario (path2) and after changing my code, I get the error below :

CODE UPDATE :

ftp.cwd('folder1/folder2/')
filename = 'zipFile2.zip'

ERROR AFTER UPDATE :

FileNotFoundError: [Errno 2] No such file or directory: 'folder3/csvFile2.csv'

It seems like Python don't recognize the folder3 (contained in the zipFile2). Is there any explanation for that, please ? How can we fix that ? I tried with ftp.cwd('folder3') right before pd.read.csv() but it doesn't work..


Solution

  • Thanks to Serge Ballesta in his post here, I finally figure out how to transform csvFile2.csv to a DataFrame :

    import ftplib
    import zipfile
    import io
    import pandas as pd
    
    ftp = ftplib.FTP("theserver_name")
    ftp.login("my_username","my_password")
    ftp.encoding = "utf-8"
        
    flo = io.BytesIO()
    ftp.retrbinary('RETR /folder1/folder2/zipFile2.zip', flo.write)
    flo.seek(0)
    
    with zipfile.ZipFile(flo) as archive:
        with archive.open('folder3/csvFile2.csv') as fd:
            df = pd.read_csv(fd, delimiter=';')
            
    display(df)