I'm trying to read a CSV over SFTP using pysftp/Paramiko. My code looks like this:
input_conn = pysftp.Connection(hostname, username, password)
file = input_conn.open("Data.csv")
file_contents = list(csv.reader(file))
But when I do this, I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 23: invalid start byte
I know that this means the file is expected to be in UTF-8 encoding but isn't. The strange thing is, if I download the file and then use my code to open the file, I can specify the encoding as "macroman" and get no error:
with open("Data.csv", "r", encoding="macroman") as csvfile:
file_contents = list(csv.reader(csvfile))
The Paramiko docs say that the encoding of a file is meaningless over SFTP because it treats all files as bytes – but then, how can I get Python's CSV module to recognize the encoding if I use Paramiko to open the file?
If the file is not huge, so it's not a problem to have it loaded twice into the memory, you can download and convert the contents in memory:
with io.BytesIO() as bio:
input_conn.getfo("Data.csv", bio)
bio.seek(0)
with io.TextIOWrapper(bio, encoding='macroman') as f:
file_contents = list(csv.reader(f))
Partially based on Convert io.BytesIO to io.StringIO to parse HTML page.