Search code examples
mysqlsqlload-data-infile

LOAD DATA INFILE - Desperately Stuck


Hopefully I can explain this well enough... I have a table called artists with essentially 2 columns, artist_pk (primary key) and artist_name. I am importing records for artist_name column via a CSV file to table artists using LOAD DATA INFILE, and letting MySQL provide the artist_pk value. This is all done.

Now I want to import data (via same import method) for my table called albums. The three relevant columns here are album_pk (primary), album_name and artist_pk (foreign key from artists table). Again, I will let mySQL assign the album_pk values.

In my CSV data I have album_name and artist_name. I do NOT have the artist_pk values. So my question is: Can I import the artist_name CSV column as part of my tables import, but rather than use it as is, instruct mySQL to use the associated artist_pk PRIMARY KEY value from the artists table?

e.g. A record in artists table:

artist_pk | artist_name  |
+-----------+--------------+
|       1 | Depeche Mode |
+-----------+--------------+

And now an excerpt from my CSV file (that I want to put in my albums table).

album_name  artist_name
Violator    Depeche Mode

'Violator' will populate albums.album_name. But to populate albums.artist_pk, I want MySQL to use 'Depeche Mode', which is in artists.artist_name, to go and get its associated artist_pk value (in this case 1) - and this is the value that will go in table albums.

Thanks in advance!


Solution

  • Assuming that artists table is already populated you can leverage session variables and SET clause in LOAD DATA INFILE to do necessary lookups while loading your data

    LOAD DATA INFILE '/path/to/albums.txt'
    INTO TABLE albums
    FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"'
    LINES TERMINATED BY '\n'
    IGNORE 1 LINES
    (album_name, @artist_name) -- use a session variable to store a value read from the file
    SET artist_pk = 
    (
      SELECT artist_pk 
        FROM artists
       WHERE artist_name = @artist_name -- use a session variable to get an artist_pk
       LIMIT 1 -- this is to ensure that only one row will be returned
    ) 
    

    Let's try it out

    mysql> CREATE TABLE artists (`artist_pk` int not null auto_increment primary key, `artist_name` varchar(12));
    Query OK, 0 rows affected (0.07 sec)
    
    mysql> INSERT INTO artists (`artist_name`) VALUES ('Depeche Mode');
    Query OK, 1 row affected (0.00 sec)
    
    mysql> CREATE TABLE albums (`album_pk` int not null auto_increment primary key, album_name varchar(255), artist_pk int, foreign key (artist_pk) references artists(artist_pk));
    Query OK, 0 rows affected (0.03 sec)
    
    mysql> LOAD DATA INFILE '/tmp/albums.txt'
        -> INTO TABLE albums
        -> FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"'
        -> LINES TERMINATED BY '\n'
        -> IGNORE 1 LINES
        -> (album_name, @artist_name)
        -> SET artist_pk = 
        -> (
        ->   SELECT artist_pk
        ->     FROM artists
        ->    WHERE artist_name = @artist_name
        -> );
    Query OK, 1 row affected (0.01 sec)
    Records: 1  Deleted: 0  Skipped: 0  Warnings: 0
    
    mysql> select * from albums;                                                                                       
    +----------+------------+-----------+
    | album_pk | album_name | artist_pk |
    +----------+------------+-----------+
    |        1 | Violator   |         1 |
    +----------+------------+-----------+
    1 row in set (0.00 sec)