Search code examples
pythoncsvparamikobyte-order-mark

Can't encode csv file opened through paramiko as utf-8-sig to remove BOM using Python


I'm having issues reading a csv file that's opened through sftp using paramiko because the first field contains the BOM  at the beginning. From what I've read, encoding as utf-8-sig will fix this, but I'm not able to figure-out how to encode this properly when using ssh_client of paramiko.

How would you encode after opening the file from SFTP? I'm using csv.Dictreader to read the file.

with ssh_client.open_sftp() as sftp_client:
    with sftp_client.file(newFileName)

Solution

  • Paramiko SFTPFile does not support encoding configuration (though for some purposes, like readline[s], it treats the file as UTF-8).

    But you should be able to skip the BOM yourself:

    with sftp_client.file(newFileName) as f:
        f.seek(0, 3)
        reader = csv.reader(f)