Search code examples
pythonsftpparamiko

Use pdfplumber and Paramiko to read a PDF file from an SFTP server


I have a direct connection to an SFTP server – the connection works without any problem and I can display files from the selected directory without any major problem. There are different files on the server, I have several functions to read them and below here is a piece of code concerning .pdf files – I use pdfplumber to read PDF files:

# SSH.connect configuration

sftp = ssh.open_sftp()

path = "/server_path/.."
for filename in sftp.listdir(path):
    fullpath = path + "/" + filename
    if filename.endswith('.pdf'):
        #fullpath - full server path with filename - like /server_path/../file.pdf
        #filename - filename without path - like file.pdf
        with sftp.open(fullpath, 'rb') as fl:
            pdf = pdfplumber.open(fl)

in this for loop I want to read all the .pdf files in the chosen directory - and it works for me on the localhost without any problem.

I tried to solve it this way with sftp.open(path, 'rb') as fl: - but in this case this solution doesn't work and such an error code appears:

Traceback (most recent call last):
pdf = pdfplumber.open(fl)
return cls(open(path, "rb"), **kwargs)
TypeError: expected str, bytes or os.PathLike object, not SFTPFile

pdfplumber.open takes as an argument the exact path to the file with its name – in this case fullpath. How can I solve this problem so that it works directly from the server? How to manage the memory in this case – because I understand that these files are somehow pulled into memory. Please give me some hints.


Solution

  • Paramiko SFTPClient.open returns a file-like object.

    To use a file-like object with pftplumber, it seems that you can use load function:

    pdf = pdfplumber.load(fl)
    

    You will also want to read this:
    Reading file opened with Python Paramiko SFTPClient.open method is slow


    As the Paramiko file-like object seems to work suboptimal when combined with pftplumber.load function, as a workaround, you can download the file to memory instead:

    flo = BytesIO()
    sftp.getfo(fullpath, flo)
    flo.seek(0)
    pdfplumber.load(flo)
    

    See How to use Paramiko getfo to download file from SFTP server to memory to process it