Search code examples
pythoncsvsftpparamikopysftp

Is it possible to specify the encoding of a file with Paramiko?


I'm trying to read a CSV over SFTP using pysftp/Paramiko. My code looks like this:

input_conn = pysftp.Connection(hostname, username, password)
file = input_conn.open("Data.csv")
file_contents = list(csv.reader(file))

But when I do this, I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 23: invalid start byte

I know that this means the file is expected to be in UTF-8 encoding but isn't. The strange thing is, if I download the file and then use my code to open the file, I can specify the encoding as "macroman" and get no error:

with open("Data.csv", "r", encoding="macroman") as csvfile:
    file_contents = list(csv.reader(csvfile))

The Paramiko docs say that the encoding of a file is meaningless over SFTP because it treats all files as bytes – but then, how can I get Python's CSV module to recognize the encoding if I use Paramiko to open the file?


Solution

  • If the file is not huge, so it's not a problem to have it loaded twice into the memory, you can download and convert the contents in memory:

    with io.BytesIO() as bio:
        input_conn.getfo("Data.csv", bio)
        bio.seek(0)
    
        with io.TextIOWrapper(bio, encoding='macroman') as f:
            file_contents = list(csv.reader(f))
    

    Partially based on Convert io.BytesIO to io.StringIO to parse HTML page.