Converting ISO-8859-1 to utf-8 (øæå)

I have a txt document containing letters ('øæå') and i want this script to recognize this letters and properly write them to the csv-file.

with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
    for line in file:

        line = file.readline() 
        lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
        splitTab = lineS.split(';')

        for s in splitTab:
            newS = s[1:-1]

        date = splitTab[0].replace('.', '/')
        insertList = [date,]
        out.writerow(date)

Gives:

  File "Q:\DropBox\Development\Scripts\tes2.py", line 17, in <module>
    lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 14: invalid start byte

Solution

with codecs.open('transaksjonliste.txt', 'r', 'ISO-8859-1') as file:
    for line in file:

        line = file.readline() 
        lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')
        splitTab = lineS.split(';')

Remove line = file.readline() , you are already iterating(reading) through the lines with the for line in file construct.

lineS = line.encode('ISO-8859-1', 'ignore').decode('utf-8')

wouldn't be what you want, as this encodes to ISO-8859-1 and then tries to decode the ISO-8859-1 as if it was UTF-8. If you want to convert 'ISO-8859-1' to UTF-8, you'd normally want to do

 lineS = line.decode('ISO-8859-1', 'ignore').encode('utf-8')

However you've already converted the data from 'ISO-8859-1' (to unicode) in the codecs.open() expression. So you just need to do

  lineS = = line.encode('utf-8')