Search code examples
mysqlcsvdouble-quotesload-data-infile

MySQL Load data infile -- double quotes in a double quoted value as "a "double" quoted value"


I have a csv file with millions of rows. Here is the command I am using to load data

load data local infile 'myfile' into table test.mytable
fields terminated by ',' optionally enclosed by '"'
lines terminated by '\n' ignore 1 lines

This caters almost everything except some of the lines where there are double quotes inside a double quoted string. as in

"first column",second column,"third column has "double quotes" inside", fourth column

It truncates the third column and give me warning as this row does not contain data for all columns.

Appreciate your help


Solution

  • The CSV is broken. There is no way MySQL or any program can import it. The double quotes needed to be escaped if inside a column.

    You might fix the CSV with a script. If the quotes doesn't have a comma in front or behind it, it's probably part of the text and should be escaped.

    The following regular expression will do a negative lookbehind and lookahead to find quotes that don't have a quote right in front or behind it.

    /(?<!^)(?<!,)(\s*)"(\s*)(?!,)(?!$)/
    

    See it on regex101


    On the command like you can run

    perl -pe 's/(?<!,)(?<!^)(\s*)"(\s*)(?!,)(?!$)/\1\\"\2/g' data.csv > data-fixed.csv
    

    Note that this method isn't fool proof. If there is a double quote that does have a comma behind it but is part of the text, there is little you can do to fix the CSV. In that case, the script simply has no way of knowing if it's a column delimiter or not.