Search code examples
mysqlperformancegogoroutine

MySQL sequential inserts are slow while threaded inserts are fast - why?


I found that sequentially inserting data into my database is very slow compared to a multi-threaded solution where both insert the same number of rows. Inserting 50000 rows took ~4 mins in my sequential approach and only ~10 seconds with the parallel version.

I use the https://github.com/go-sql-driver/mysql driver.
For the database, I just took the recent version of XAMPP for windows and use the MySQL database with its standard config.

sequential version:

for i := 0; i < 50000; i++ {

        _, err2 := db.Exec("insert into testtable (num, text1, text2) values (?, ?, ?)", i, "txt1", "txt2")

        if err2 != nil {
            fmt.Println(err2)
        }

}

Parallel version:

for i := 0; i < 50; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for j := 0; j < 1000; j++ {
                _, err2 := db.Exec("insert into testtable (num, text1, text2) values (?, ?, ?)", 1, "txt1", "txt2")
                if err2 != nil {
                    fmt.Println(err2)
                }
            }
        }()
}

Why is the first one so slow compared to the second version?
Any Ideas? Am I maybe using the wrong function for inserting data?


Solution

  • There is a lot of overhead in running an INSERT:

    • Communication between client and server.
    • Parse the INSERT
    • Open the table, etc.
    • Get next AUTO_INCREMENT value.
    • Check for conflicts, deadlocks, etc.
    • Commit the transaction.

    And all of that is done in a single CPU, with waiting for I/O, if necessary.

    You have 50 threads; they ran 24 times as fast.

    But you can do 10 times as good as that -- Batch up the rows into a single INSERT 100 rows at a time. This eliminates much of the overhead, especially the commit. (Going past 100-1000 rows is getting into diminishing returns and other overheads; so stop there.)

    Meanwhile, don't use more threads than, say, twice the number of CPU cores you have. Otherwise, they will just stumble over each other. The may be why 50 threads was only 24 times as fast.