Search code examples
pythoncsvutf-8encode

Parse bytes to str while reading csv with Python


While I python code that write and read to csv file utf8 string

import csv

test1='ab"cc"dd'.encode('utf8')
test2='bbb'.encode('utf8')
csv_file = open('test.csv','w')
writer= csv.writer(csv_file)
writer.writerow([test1,test2])
csv_file.close()

with open('test.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    print(spamreader)
    for row in spamreader:
        print(', '.join(row))

The problem is that when I read I got b'ab"cc"dd', b'bbb' instead of ab"cc"dd,bbb

How can I decode that string (I must put utf8 into csv) ?


Solution

  • No need for manual encoding/decoding. Open the file with the specific encoding you want because the default encoding varies by OS configuration. This is called the "Unicode sandwich". Encode/decode when writing/reading the file and work with Unicode only within the Python script.

    Also, csv.reader and csv.writer expect Unicode strings, so providing encoded byte strings is incorrect.

    import csv
    
    test1 = 'ab"cc"dd'
    test2 = 'bbb'
    with open('test.csv', 'w', encoding='utf8', newline='') as csv_file:
        writer= csv.writer(csv_file)
        writer.writerow([test1,test2])
    
    with open('test.csv', encoding='utf8', newline='') as csvfile:
        spamreader = csv.reader(csvfile)
        for row in spamreader:
            print(row)
            print(', '.join(row))
    
    ['ab"cc"dd', 'bbb']
    ab"cc"dd, bbb
    

    Additionally, if you want your .CSV files to be readable in Microsoft Excel, use utf-8-sig as the encoding or it won't detect UTF-8 properly.