I have an issue with read_csv
and its taking a lot of time to resolve.
I am working with texts which have multiple special characters, so I was checking which character isn't in the list of texts and chose § as delimiter while writing the csv
files that separates the texts with corresponding IDs.
However, while reading the files, I am getting the following error. I could skip the bad lines, but in this case I cannot afford to lose any texts.
ParserError: '§' expected after '"'
Writing
df.to_csv('20231010.csv',
index=False,
sep='§',
#header=None,
quoting=csv.QUOTE_NONE,
quotechar="",
escapechar=" ")
Reading
data = pd.read_csv('20231010.csv', sep ="§", encoding='utf-8')
It doesn't make sense to disable quoting, and actually you don't even need to use a fancy character, just use the default settings:
df = pd.DataFrame({'text1': ['abc"123§', 'def ,456'],
'text2': ['ghi`789', 'jkl|123'],
})
df.to_csv('20231010.csv', index=False)
CSV:
text1,text2
"abc""123§",ghi`789
"def ,456",jkl|123
Importing again:
df2 = pd.read_csv('20231010.csv')
print(df2)
Output:
text1 text2
0 abc"123§ ghi`789
1 def ,456 jkl|123
Pandas can relatively well import/export a CSV file without changes. The most common things that could cause a change are:
to_csv
, which gets converted to column by read_csv
NULL
/NA
), which can be annoying if those strings have a different meaning in your contextYou can avoid theses issues by using index=False
in to_csv
(as you did), and keep_default_na=False
in read_csv
.