I have a csv file with millions of rows. Here is the command I am using to load data
load data local infile 'myfile' into table test.mytable
fields terminated by ',' optionally enclosed by '"'
lines terminated by '\n' ignore 1 lines
This caters almost everything except some of the lines where there are double quotes inside a double quoted string. as in
"first column",second column,"third column has "double quotes" inside", fourth column
It truncates the third column and give me warning as this row does not contain data for all columns.
Appreciate your help
The CSV is broken. There is no way MySQL or any program can import it. The double quotes needed to be escaped if inside a column.
You might fix the CSV with a script. If the quotes doesn't have a comma in front or behind it, it's probably part of the text and should be escaped.
The following regular expression will do a negative lookbehind and lookahead to find quotes that don't have a quote right in front or behind it.
/(?<!^)(?<!,)(\s*)"(\s*)(?!,)(?!$)/
On the command like you can run
perl -pe 's/(?<!,)(?<!^)(\s*)"(\s*)(?!,)(?!$)/\1\\"\2/g' data.csv > data-fixed.csv
Note that this method isn't fool proof. If there is a double quote that does have a comma behind it but is part of the text, there is little you can do to fix the CSV. In that case, the script simply has no way of knowing if it's a column delimiter or not.