Search code examples
mysqldatabaseperformanceinnodb

Batch MYSQL inserts for performance in DB structure migration


I need to restructure my MYSQL InnoDB database.

At the moment I have a customer table holding 3 product names.

I need to extract these names to a new product table. The product table should hold each name currently held in the customer table and be linked to the customer table via a new customer_product table. While the product names may not be unique, they don't have anything to do with each other, meaning for each customer there will need to be inserted 3 new entries into the product table and 3 new entries into the customer_product table.

So instead of this:

customer
| id | product_name_a | product_name_b | product_name_c |

I need this:

customer
| id |

customer_product
| customer_id | product_id | X3

product
| id | name | X3

I've written the following MYSQL procedure that works:

BEGIN
  DECLARE nbr_of_customers BIGINT(20);
  DECLARE customer_count BIGINT(20);
  DECLARE product_id BIGINT(20);
  DECLARE customer_id BIGINT(20);
  DECLARE product_name_a VARCHAR(500);
  DECLARE product_name_b VARCHAR(500);
  DECLARE product_name_c VARCHAR(500);

  SELECT COUNT(*) FROM customer INTO nbr_of_customers;
  SET customer_count = 0;
  SET product_id = 1;

  WHILE customer_count < nbr_of_customers DO
    SELECT
      customer.id,
      customer.product_name_a,
      customer.product_name_b,
      customer.product_name_c
    INTO
      customer_id,
      product_name_a,
      product_name_b,
      product_name_c
    FROM customer
    LIMIT customer_count,1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_a);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_b);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    INSERT INTO product(id, name)
      VALUES(product_id, product_name_c);
    INSERT INTO customer_product(customer_id, product_id)
      VALUES(customer_id, product_id);
    SET product_id = product_id + 1;

    SET customer_count = customer_count + 1;
  END WHILE;
END;

This is too slow.

I've run this locally and estimate that my ~15k customers would take ~1h to complete. And my VPS server is far slower than that, so it could take upward to 10h to complete.

The problem seem to be the inserts taking a long time. I've would therefore like to store all the inserts during the procedure and execute them all in batch after the loop is complete and I know what to insert.

I there a way to perform all the ~100k inserts in batch to optimize performance, or is there a better way to do it?


FINAL EDIT:

I marked the correct solution based on that it did an excellent job of speeding up the process massively, which was the main focus of the question. In the end I ended up performing the migration using modified production code (in Java), due to the solution's limitations regarding not escaping the inserted strings.


Solution

  • First, use a cursor to process the results of a single query, rather than performing a separate query for each row.

    Then concatenate the VALUES lists into strings that you execute using PREPARE and EXECUTE.

    My code does the inserts in batches of 100 customers, because I expect there's a limit on the size of a query.

    BEGIN
      DECLARE product_id BIGINT(20);
      DECLARE customer_id BIGINT(20);
      DECLARE product_name_a VARCHAR(500);
      DECLARE product_name_b VARCHAR(500);
      DECLARE product_name_c VARCHAR(500);
      DECLARE done INT DEFAULT FALSE;
      DECLARE cur CURSOR FOR SELECT c.id, c.product_name_a, c.product_name_b, c.product_name_c FROM customer AS c;
      DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    
      SET product_id = 1;
    
      OPEN cur;
    
      SET @product_values = '';
      SET @cp_values = '';
    
      read_loop: LOOP
        FETCH cur INTO customer_id, product_name_a, product_name_b, product_name_c;
        IF done THEN
          LEAVE read_loop;
        END IF;
    
        SET @product_values = CONCAT(@product_values, IF(@product_values != '', ',', ''), "(", product_id, ",'", product_name_a, "'), (", product_id + 1, ",'", product_name_b, "'), (", product_id + 2, ",'", product_name_c, "'), ");
        SET @cp_values = CONCAT(@cp_values, IF(@cp_values != '', ',', ''), "(", customer_id, ",", product_id, "), (", customer_id, ",", product_id + 1, "), (", customer_id, ",", product_id + 2, "),");
    
        SET product_id = product_id + 3;
    
        IF product_id % 300 = 1 -- insert every 100 customers
        THEN BEGIN
             SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
             PREPARE stmt1 FROM @insert_product;
             EXECUTE stmt1;
    
             SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
             PREPARE stmt2 FROM @insert_cp;
             EXECUTE stmt2;
    
             SET @product_values = '';
             SET @cp_values = '';
         END IF;
    
      END LOOP;
    
      IF @product_values != '' -- Process any remaining rows
      THEN BEGIN
           SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
           PREPARE stmt1 FROM @insert_product;
           EXECUTE stmt1;
    
           SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
           PREPARE stmt2 FROM @insert_cp;
           EXECUTE stmt2;
    
           SET @product_values = '';
           SET @cp_values = '';
       END IF;
    END;
    

    Beware that, using this solution, the product names will not be properly escaped before inserting. This solution will therefore not work if any of the product names contains special characters, such as single quote '.