Search code examples
mysqlnullload-data-infile

How to disable insertion of double quotes when loading from local infile into table in mysql?


While importing a tab-separated-values (TSV) file, I find that there are two cases where mysql injects double quotes that I would like to disable.

Case 1: When the field value is empty/blank (zero characters) - instead of a NULL, a "" gets inserted. I prefer it be just a NULL.

Case 2: When a double quote is already in the string (i.e. 2" Square), it replaces the occurrence of the double quote with two double quotes and it also wraps the entire string in double quotes. Example: 2" Square gets inserted as "2"" Square". I want it to be just 2" Square, just as it is in the report.tsv file.

The current SQL I am using is (report-columns is a list of columns each with the definition VARCHAR(25) DEFAULT NULL)

LOAD DATA LOCAL INFILE 'report.tsv' 
INTO TABLE properties 
IGNORE 1 LINES (report-columns)

I tried adding FIELDS ESCAPED BY '' but this didn't change the results.

I prefer a plan to be able to just disable this from happening in the first place in the same query that loads the file, but if that is not possible a backup plan would be to execute a cleanup query for all report-columns to be run immediately after import to address both cases 1 and 2 above.

Ideas on either plan?

Update

After further investigation into this I determined that it was not mysql's loading of the data that was causing case 2, rather it was a clean script just before the load. The answer for using SET and NULLIF in the same query with the LOAD DATA LOCAL INFILE statement has resolved case 1 nicely without needing a second query cleanup.


Solution

  • If you want a NULL to be inserted, you can use a literal \N in your source text file, or the word NULL, depending on some of your options. This is documented in the manual for LOAD DATA INFILE: https://dev.mysql.com/doc/refman/5.7/en/load-data.html

    Look for the phrase "Handling of NULL values varies according to the FIELDS and LINES options in use" and read what follows (I'm not going to copy the manual into this Stack Overflow answer).

    Another option is to use an expression to modify the value on import if it's an empty string:

    LOAD DATA LOCAL INFILE 'report.tsv' 
    INTO TABLE properties 
    IGNORE 1 LINES (report-columns)
    SET some_col = NULLIF(some_col, '');
    

    I'm not sure what to recommend for the second case, when you have double-quotes in your input text data. I tried this, and it worked fine for me. I was able to import the string 2" square.

    It could be complicated if you use LOAD DATA INFILE with the ENCLOSED BY '"' option. You might want to enclose fields by some different character that doesn't appear in the text.