Search code examples
pythonpandaspostgresqlsftppysftp

Read CSV/Excel files from SFTP file, make some changes in those files using Pandas, and save back


I want to read some CSV/Excel files on a secure SFTP folder, make some changes (fixed changes in each file like remove column 2) in those files, upload them to a Postgre DB and also the upload them to a different SFTP path in Python

What's the best way to this?

I have made a connection to the SFTP using pysftp library and am reading the Excel:

import pysftp
import pandas as pd

myHostname = "*****"
myUsername = "****"
myPassword = "***8"
cnopts =pysftp.CnOpts()
cnopts.hostkeys = None  

sftp=pysftp.Connection(host=myHostname, username=myUsername, 
password=myPassword,cnopts=cnopts)
print ("Connection succesfully stablished ... ")
sftp.chdir('test/test')
#sftp.pwd
a=[]
for i in sftp.listdir_attr():
    with sftp.open(i.filename) as f:
        df=pd.read_csv(f)

How should I proceed with the upload to DB and making those changes to the CSV permanent?


Solution

  • You have the download part done.

    For the upload part, see How to Transfer Pandas DataFrame to .csv on SFTP using Paramiko Library in Python? – While it's for Paramiko, pysftp Connection.open method behaves identically to Paramiko SFTPClient.open, so the code is the same (though, you should not use pysftp).

    Full code can be like:

    with sftp.open("/remote/path/data.csv", "r+", bufsize=32768) as f:
        # Download CSV contents from SFTP to memory
        df = pd.read_csv(f)
    
        # Modify as you need (just an example)
        df.at[0, 'Name'] = 'changed'
    
        # Upload the in-memory data back to SFTP
        f.seek(0)
        df.to_csv(f, index=False)
        # Truncate the remote file in case the new version of the contents is smaller
        f.truncate(f.tell())
    

    The above updates the same file. If you want to upload to a different file, use this:

    # Download CSV contents from SFTP to memory
    with sftp.open("/remote/path/source.csv", "r") as f:
        df = pd.read_csv(f)
    
    # Modify as you need (just an example)
    df.at[0, 'Name'] = 'changed'
    
    # Upload the in-memory data back to SFTP
    with sftp.open("/remote/path/target.csv", "w", bufsize=32768) as f:
        df.to_csv(f, index=False)
    

    For the purpose of bufsize, see:
    Writing to a file on SFTP server opened using Paramiko/pysftp "open" method is slow


    Obligatory warning: Do not set cnopts.hostkeys = None, unless you do not care about security. For the correct solution see Verify host key with pysftp.