While I python code that write and read to csv file utf8 string
import csv
test1='ab"cc"dd'.encode('utf8')
test2='bbb'.encode('utf8')
csv_file = open('test.csv','w')
writer= csv.writer(csv_file)
writer.writerow([test1,test2])
csv_file.close()
with open('test.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
print(spamreader)
for row in spamreader:
print(', '.join(row))
The problem is that when I read I got b'ab"cc"dd', b'bbb'
instead of ab"cc"dd,bbb
How can I decode that string (I must put utf8 into csv) ?
No need for manual encoding/decoding. Open the file with the specific encoding you want because the default encoding varies by OS configuration. This is called the "Unicode sandwich". Encode/decode when writing/reading the file and work with Unicode only within the Python script.
Also, csv.reader
and csv.writer
expect Unicode strings, so providing encoded byte strings is incorrect.
import csv
test1 = 'ab"cc"dd'
test2 = 'bbb'
with open('test.csv', 'w', encoding='utf8', newline='') as csv_file:
writer= csv.writer(csv_file)
writer.writerow([test1,test2])
with open('test.csv', encoding='utf8', newline='') as csvfile:
spamreader = csv.reader(csvfile)
for row in spamreader:
print(row)
print(', '.join(row))
['ab"cc"dd', 'bbb']
ab"cc"dd, bbb
Additionally, if you want your .CSV files to be readable in Microsoft Excel, use utf-8-sig
as the encoding or it won't detect UTF-8 properly.