Search code examples
pythonpandasparse-errorread-csv

Pandas.read_csv ParserError '§' expected after '"' with sep = "§"


I have an issue with read_csv and its taking a lot of time to resolve.

I am working with texts which have multiple special characters, so I was checking which character isn't in the list of texts and chose § as delimiter while writing the csv files that separates the texts with corresponding IDs.

However, while reading the files, I am getting the following error. I could skip the bad lines, but in this case I cannot afford to lose any texts.

ParserError: '§' expected after '"'

Writing

df.to_csv('20231010.csv',
           index=False,
           sep='§',
           #header=None,
           quoting=csv.QUOTE_NONE,
           quotechar="",
           escapechar=" ")

Reading

data = pd.read_csv('20231010.csv',  sep ="§", encoding='utf-8')

Solution

  • It doesn't make sense to disable quoting, and actually you don't even need to use a fancy character, just use the default settings:

    df = pd.DataFrame({'text1': ['abc"123§', 'def ,456'],
                       'text2': ['ghi`789', 'jkl|123'],
                      })
    
    df.to_csv('20231010.csv', index=False)
    

    CSV:

    text1,text2
    "abc""123§",ghi`789
    "def ,456",jkl|123
    

    Importing again:

    df2 = pd.read_csv('20231010.csv')
    print(df2)
    

    Output:

          text1    text2
    0  abc"123§  ghi`789
    1  def ,456  jkl|123
    

    Pandas can relatively well import/export a CSV file without changes. The most common things that could cause a change are:

    • the default inclusion of the index in to_csv, which gets converted to column by read_csv
    • conversion of specific strings to NaN (e.g. NULL/NA), which can be annoying if those strings have a different meaning in your context

    You can avoid theses issues by using index=False in to_csv (as you did), and keep_default_na=False in read_csv.