I'm trying to import a huge data set from a csv file ~400MB with 900000 rows. This file has the information of two relational tables. For example:
["primary_key","name","lastname","phone,"work_id","work_name"]
Every row i have to check if the primary key exists for insert or updated if needed, also i need to verify work, because new works can appear in this dataset.
My person table has more colummns that the csv file has, so i can't replace the line with mysqlimport.
Any ideas on how to work with this?
Please read the documentation for LOAD DATA INFILE
; it is a good choice for loading data, even very big files. Quoting from Reference manual: Speed of insert statements:
When loading a table from a text file, use
LOAD DATA INFILE
. This is usually 20 times faster than usingINSERT
statements
Assuming that your table as more columns than the .csv
file, then you'd have to write something like this:
load data local infile 'path/to/your/file.csv'
into table yourTable
fields terminated by ',' optionally enclosed by '"' lines terminated by '\n'
ignore 1 lines -- if it has column headers
(col1, col2, col3, ...) -- The matching column list goes here
See my own question on the subject and its answer.
Also, if you need faster inserts, you can:
SET foreign_key_checks = 0;
before executing load data
, and/oralter table yourTable disable keys;
before executing load data
, and rebuilding them afterwards with alter table yourTable enable keys;
Untested: If your .csv
file has more columns than your table, I think that you can assign the "exceeding" columns in the file to temp variables:
load data local infile 'path/to/your/file.csv'
into table yourTable
fields terminated by ',' optionally enclosed by '"' lines terminated by '\n'
ignore 1 lines -- if it has column headers
(col1, col2, col3, @dummyVar1, @dummyVar2, col4) -- The '@dummyVarX` variables
-- are simply place-holders for
-- columns in the .csv file that
-- don't match the columns in
-- your table