pythonmysqlmariadb

Are there any obvious bottlenecks in my MariaDB UPDATE?


I have a MariaDB table with 28 million rows. I need to update all the rows (locally) with a new value in a column. This is the code in Python to batch update:

update_query = "UPDATE table SET column = %s WHERE `index` = %s"
%time cursor.executemany(update_query, update_data)

column has an index, and index is the primary key.

When I use a batch of 100,000 rows, it takes 50 seconds to run the executemany(). I am not sure if this is very slow or normal. And if it's slow, I don't know where to start for speeding it up.


Solution

  • Unlike INSERT, executemany() can't automatically batch UPDATE queries, so the bottleneck is performing 100,000 queries.

    Instead, use INSERT INTO ... ON DUPLICATE KEY UPDATE .... This will be batched automatically.

    update_query = 'INSERT INTO table (column, index) VALUES (%s, %s) ON DUPLICATE KEY UPDATE column = VALUES(column)'
    %time cursor.executemany(update_query, update_data)