Search code examples
pythonpython-3.xauthenticationsftppysftp

Download fails with "Authentication failed" when we try to download 50+ files from SFTP serially using pysftp in Python?


for remote_path in list_of_stfp_paths:
    with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
        sftp.get(remote_path, str(local_path))

    #checks distinct count of a column for the csv downloaded, deletes it later
    df = pd.read_csv(str(local_path))
    print(df['taken_time'].value_counts())
    os.remove(str(local_path))

The code I use is above. It's just run in a for loop with multiple remote paths. Sometimes, it gets completed. Sometimes, I get an error saying

Exception: Authentication failed.


Solution

  • Do not reconnect for each file. Loop the downloads only, not the connection:

    with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
        for remote_path in list_of_stfp_paths:
            sftp.get(remote_path, str(local_path))
    
            #checks distinct count of a column for the csv downloaded, deletes it later
            df = pd.read_csv(str(local_path))
            print(df['taken_time'].value_counts())
            os.remove(str(local_path))
    

    Though note that you do not even have to download the files to a local disk, just read them straight from the SFTP server:

    with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD) as sftp:
        for remote_path in list_of_stfp_paths:
            with sftp.open(remote_path) as f:
                f.prefetch()
                #checks distinct count of a column for the csv
                df = pd.read_csv(f)
                print(df['taken_time'].value_counts())
    

    It might even be faster as it allows the download and parsing happen in parallel, not in sequence. See Read CSV/Excel files from SFTP file, make some changes in those files using Pandas, and save back