Search code examples
mysqlload-data-infile

MySQL LOAD DATA LOCAL INFILE avoiding duplicates


I'm using this LOAD DATA LOCAL INFILE statement to quickly upload files.

LOAD DATA LOCAL INFILE 'file_location/file.txt'
    INTO TABLE mytable 
    FIELDS TERMINATED by '\t'
    LINES TERMINATED BY '\n'
    IGNORE 18 LINES //ignore some header info
    (sampleID, testID);

But there are some files that I've already loaded in the past and I want to avoid duplications.

The database structure includes an auto incrementing PK as an additional column.

Without having to open each file and do a search prior to uploading in order to determine whether those records are already there, is there a way of modifying the command I am using to ignore duplicate rows based on having both sampleID and testID rows in table being matched to the incoming file?

For example, supposing the current table has the following rows:

tablePKID    sampleID    testID
    0         0001        A
    1         0001        B
    2         0003        A
    3         0003        B

And I wish to import a file with the following records

  0001        A
  0001        B
  0001        C
  0005        A
  0005        B

How can I ensure only

  0001        C
  0005        A
  0005        B

are imported?


Solution

  • I don't know how you got tablePKID to auto increment from 0 putting that to one side for now you can specify a unique key on samplid and testid and load..IGNORE for example

    DROP TABLE IF EXISTS T;
    CREATE TABLE T
    (tablePKID   INT auto_increment primary key, 
    sampleID VARCHAR(10),   testID VARCHAR(10));
    
    alter table t
        add unique key k1(sampleid,testid);
    
    INSERT INTO T (sampleid,testid) VALUES
    (   '0001'   ,     'A'),
    (   '0001'   ,     'B'),
    (   '0003'   ,     'A'),
    (   '0003'   ,     'B');
    

    File data.txt

    sampleid    testid
    "0001"  "A"
    "0001"  "B"
    "0001"  "C"
    "0005"  "A"
    "0005"  "B"
    
    
    LOAD DATA LOCAL INFILE 'C:\\Program Files\\MariaDB 10.1\\data\\sandbox\\data.txt'
         ignore
        INTO TABLE t 
        FIELDS TERMINATED by '\t'
        enclosed by '"'
        LINES TERMINATED BY '\r\n'
        IGNORE 1 LINES 
        (sampleID, testID);
    
    +-----------+----------+--------+
    | tablePKID | sampleID | testID |
    +-----------+----------+--------+
    |         1 | 0001     | A      |
    |         2 | 0001     | B      |
    |         3 | 0003     | A      |
    |         4 | 0003     | B      |
    |         5 | 0001     | C      |
    |         6 | 0005     | A      |
    |         7 | 0005     | B      |
    +-----------+----------+--------+
    7 rows in set (0.00 sec)