I have read through similar questions on stack overflow, however non of them solve the unicode problem I have: 'ascii' codec can't decode byte 0xc3 in position 302.
Have tried: import sys reload(sys) sys.setdefaultencoding("utf-8")
however receive an error: NameError: name 'reload' is not defined
I try to read file with danish vowels: æ, ø, å. In return receive 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 position 302 etc. Position 302 and further on include danish vowels. Is there a way to fix this?
So far I have tried putting a specially-formatted comment as the first line of the source code: # -*- coding: <ascii> -*-
. Did not give any result.
Also tried: f = open(fname, encoding="ascii", errors="surrogate escape")
. But instead of reading file with characters as they are for example in the word "Europæiske" I get "Europ\udcc3\udca6iske".
Then I tried suggestions from the blog (lost a link to that blog) to "import unicodedata", however, it was not well explained where to take it form there.
import unicodedata
import csv
with open('File.csv') as f:
reader = csv.reader(f)
for row in reader:
print(row)
Simply open with the correct encoding. You have to know the encoding that the file was saved in. Western versions of Windows might be Windows-1252
, or perhaps utf8
. Modules such as chardet can perform an educated guess. Also, for for csv
module, open with newline=''
as well (see documentation for using csv.reader
:
import csv
with open('File.csv',encoding='utf8',newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)