Search code examples
mysqlcsvimport-csv

MySQL Fastest Way To Import 125000 line CSV?


This is my first time working with MySQL besides a few basic queries on an existing DB, so I'm not great at troubleshooting this.

I have a CSV with 125,000 records that I want to load into MySQL. I got version 8 installed along with workbench. I used the Import Wizard to load my CSV and it started importing. The problem is that it was ~5 hours to get to 30,000 records. From what I read this is a long time and there should be a faster way.

I tried LOAD DATA INFILE but got an error regarding secure-file-priv so I went looking to solve that. The configuration appear to be off for secure-file-priv but it keeps popping up as the error. Now I'm getting "Access denied" errors so I'm just stuck.

I am the admin on this machine and this data doesn't mean anything to anyone so security isn't a concern. I just want to learn how to do this.

Is LOAD DATA INFILE the best way to load his amount of data? Is 20 hours too long for 125000 records? Anyone have any idea what I'm doing wrong?


Solution

  • You don't need to set secure-file-priv if you use LOAD DATA LOCAL INFILE. This allows the client to read the file content on the computer where the client runs, so you don't have to upload the file to the designated directory on the database server. This is useful if you don't have access to the database server.

    But the LOCAL option is disabled by default. You have to enable it in both server and client with the local-infile option in my.cnf on the server, and also using in the MySQL client by using mysql --local-infile.

    In addition, your user must be granted the FILE privilege to load files into a table. See https://dev.mysql.com/doc/refman/8.0/en/privileges-provided.html

    Once it's working, LOAD DATA INFILE should be the fastest way to bulk-load data. I did a bunch of comparative speed tests for a presentation Load Data Fast!

    You may also have some limiting factors with respect to MySQL Server configuration options, or even performance limitations with respect to the computer hardware.

    I think the 5 hours for 30k records is way too long even on modest hardware.

    I tested on a Macbook with builtin SSD storage. Even in my test designed to be as inefficient as possible (open connection, save one row using INSERT, disconnect), I still was able to insert 290 rows/second, or 10k rows in 34 seconds. The best result was using LOAD DATA INFILE, at a rate of close to 44k rows/second, loading 1 million rows in 22 seconds.

    So something is severely underpowered on your database server, or else the Import Wizard is doing something so inefficient I cannot even imagine what it could be.