Search code examples
pythonpython-3.xpandasftpftplib

Reading CSV file downloaded from FTP in Python not reading all rows


I am trying to read a CSV file from a folder in FTP. The file has 3072 rows. However, when I am running the code, it is not reading all the rows. Certain rows from the bottom are getting missed out.

## FTP host name and credentials
ftp = ftplib.FTP('IP', 'username','password')

## Go to the required directory
ftp.cwd("Folder_Name")

names = ftp.nlst()
final_names= [line for line in names if '.csv' in line]

latest_time = None
latest_name = None

#os.chdir(filepath)

for name in final_names:
    
    time1 = ftp.sendcmd("MDTM " + name)
    if (latest_time is None) or (time1 > latest_time):
        latest_name = name
        latest_time = time1

file = open(latest_name, 'wb')

ftp.retrbinary('RETR '+ latest_name, file.write)

dat = pd.read_csv(latest_name)

The CSV file to be read from FTP is as given below-

enter image description here

The output from the code is as-

enter image description here


Solution

  • Make sure you close the file, before you try to read it, using file.close(), or even better using with:

    with open(latest_name, 'wb') as file:
        ftp.retrbinary('RETR '+ latest_name, file.write)
    
    dat = pd.read_csv(latest_name)
    

    If you do not need to actually store the file to local file system, and the file is not too large, you can download it to memory only:
    Reading files from FTP server to DataFrame in Python


    Though, pandas.read_csv documentation claims that it supports FTP directly.
    So this should do too:

    pd.read_csv("ftp://username:[email protected]/remote/path/" + latest_name)