So I am facing the following problem:
I have a ;
separated csv, which has ;
enclosed in quotes, which is corrupting the data.
So like abide;acdet;"adds;dsss";acde
The ;
in the "adds;dsss"
is moving " dsss"
to the next line, and corrupting the results of the ETL module which I am writing. my ETL is taking such a csv from the internet, then transforming it (by first loading it in Pandas data frame, doing pre-processing and then saving it), then loading it in sql server. But corrupted files are breaking the sql server schema.
Is there any solution which I can use in conjunction with Pandas data frame which allows me to fix this issue either during the read(pd.read_csv) or writing(pd.to_csv)( or both) part using Pandas dataframe?
You might need to tell the reader some fields may be quoted:
pd.read_csv(your_data, sep=';', quotechar='"')