I have a MariaDB table with 28 million rows. I need to update all the rows (locally) with a new value in a column. This is the code in Python to batch update:
update_query = "UPDATE table SET column = %s WHERE `index` = %s"
%time cursor.executemany(update_query, update_data)
column
has an index, and index
is the primary key.
When I use a batch of 100,000 rows, it takes 50 seconds to run the executemany()
. I am not sure if this is very slow or normal. And if it's slow, I don't know where to start for speeding it up.
Unlike INSERT
, executemany()
can't automatically batch UPDATE
queries, so the bottleneck is performing 100,000 queries.
Instead, use INSERT INTO ... ON DUPLICATE KEY UPDATE ...
. This will be batched automatically.
update_query = 'INSERT INTO table (column, index) VALUES (%s, %s) ON DUPLICATE KEY UPDATE column = VALUES(column)'
%time cursor.executemany(update_query, update_data)