I have a txt document containing letters ('øæå') and i want this script to recognize this letters and properly write them to the csv-file.
with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
for line in file:
line = file.readline()
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
splitTab = lineS.split(';')
for s in splitTab:
newS = s[1:-1]
date = splitTab[0].replace('.', '/')
insertList = [date,]
out.writerow(date)
Gives:
File "Q:\DropBox\Development\Scripts\tes2.py", line 17, in <module>
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 14: invalid start byte
with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
for line in file:
line = file.readline()
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
splitTab = lineS.split(';')
Remove line = file.readline()
, you are already iterating(reading) through the lines with the for line in file
construct.
lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
wouldn't be what you want, as this encodes to ISO-8859-1 and then tries to decode the ISO-8859-1 as if it was UTF-8. If you want to convert 'ISO-8859-1' to UTF-8, you'd normally want to do
lineS = line.decode('ISO-8859-1', 'ignore').encode('utf-8')
However you've already converted the data from 'ISO-8859-1' (to unicode) in the codecs.open() expression. So you just need to do
lineS = = line.encode('utf-8')