Search code examples
mysqlinnodb

Fastest way to trim one column's data in large MySQL table


I've inherited a database that has a lot of problems. One of the ones that I'm currently trying to solve is that many of the tables have character primary keys which are padded with spaces to 10 characters.

For the most part, I've been able to update the tables to remove the padding fairly easily. There is one secondary table, though, that has over ten million records. It has a foreign key to one of the tables with the padded primary key.

I want to update all the values of this foreign key column to be their trimmed value. I tried a simple update query.

UPDATE actions SET foreignkey = TRIM(foreignkey);

This gives me the error "The total number of locks exceeds the lock table size". It looks like there might be a fix for this by changing the innodb_buffer_pool_size, but I decided to try another strategy as well. It struck me that if I recreated the table and selected into it, I would avoid this error and also wouldn't experience slowdown from messing with indexed fields.

CREATE TABLE actions2 LIKE actions;
INSERT INTO actions2 (id, foreignkey, otherfields) SELECT id, TRIM(foreignkey), otherfields FROM actions;

Is this solution going to be significantly faster than the UPDATE solution if I don't disable the table's indexes beforehand? Is there a faster way to do this that I'm missing?

EDIT: The foreign key for this table and primary key for the other table are VARCHAR(10) fields. Also, it is acceptable for me to only load the newest 2 million records into the table in one batch, and slowly fill the rest over time.


Solution

  • I would walk through that table in "chunks", doing perhaps 1000 rows at a time. Here is some pseudo code. (The details depend on the language you wish to write this in.)

    $a = '';  -- assuming this is less than any value
    loop...
        $z = SELECT v FROM main
            WHERE v > $a  ORDER BY v  LIMIT 1000,1;  -- efficient locate stopper
        BEGIN;
        -- Update each table
        UPDATE main SET v = TRIM(v)
            WHERE v > $a AND v <= $z;
        UPDATE table2 SET v = TRIM(v)
            WHERE v > $a AND v <= $z;
        UPDATE table3 SET v = TRIM(v)
            WHERE v > $a AND v <= $z;
        COMMIT;   -- this keeps anyone from stumbling over FKs in transition
        if finished, exit loop
        $a = $z
    end loop
    

    More discussion.

    I picked 1000 somewhat arbitrarily. It might be small enough to have minimal impact on the running system, yet large enough not to take forever.

    Note: If you already have some trimmed values, could you be headed for a "duplicate key"?