Search code examples
python-3.xpandascsvdata-cleaningquoting

Solving csv files with quoted semicolon in Pandas data frame


So I am facing the following problem:

I have a ; separated csv, which has ; enclosed in quotes, which is corrupting the data.

So like abide;acdet;"adds;dsss";acde

The ; in the "adds;dsss" is moving " dsss" to the next line, and corrupting the results of the ETL module which I am writing. my ETL is taking such a csv from the internet, then transforming it (by first loading it in Pandas data frame, doing pre-processing and then saving it), then loading it in sql server. But corrupted files are breaking the sql server schema.

Is there any solution which I can use in conjunction with Pandas data frame which allows me to fix this issue either during the read(pd.read_csv) or writing(pd.to_csv)( or both) part using Pandas dataframe?


Solution

  • You might need to tell the reader some fields may be quoted:

    pd.read_csv(your_data, sep=';', quotechar='"')