I need to restructure my MYSQL InnoDB database.
At the moment I have a customer
table holding 3 product names.
I need to extract these names to a new product
table. The product
table should hold each name currently held in the customer
table and be linked to the customer
table via a new customer_product
table. While the product names may not be unique, they don't have anything to do with each other, meaning for each customer
there will need to be inserted 3 new entries into the product
table and 3 new entries into the customer_product
table.
So instead of this:
customer
| id | product_name_a | product_name_b | product_name_c |
I need this:
customer
| id |
customer_product
| customer_id | product_id | X3
product
| id | name | X3
I've written the following MYSQL procedure that works:
BEGIN
DECLARE nbr_of_customers BIGINT(20);
DECLARE customer_count BIGINT(20);
DECLARE product_id BIGINT(20);
DECLARE customer_id BIGINT(20);
DECLARE product_name_a VARCHAR(500);
DECLARE product_name_b VARCHAR(500);
DECLARE product_name_c VARCHAR(500);
SELECT COUNT(*) FROM customer INTO nbr_of_customers;
SET customer_count = 0;
SET product_id = 1;
WHILE customer_count < nbr_of_customers DO
SELECT
customer.id,
customer.product_name_a,
customer.product_name_b,
customer.product_name_c
INTO
customer_id,
product_name_a,
product_name_b,
product_name_c
FROM customer
LIMIT customer_count,1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_a);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_b);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_c);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
SET customer_count = customer_count + 1;
END WHILE;
END;
This is too slow.
I've run this locally and estimate that my ~15k customers would take ~1h to complete. And my VPS server is far slower than that, so it could take upward to 10h to complete.
The problem seem to be the inserts taking a long time. I've would therefore like to store all the inserts during the procedure and execute them all in batch after the loop is complete and I know what to insert.
I there a way to perform all the ~100k inserts in batch to optimize performance, or is there a better way to do it?
FINAL EDIT:
I marked the correct solution based on that it did an excellent job of speeding up the process massively, which was the main focus of the question. In the end I ended up performing the migration using modified production code (in Java), due to the solution's limitations regarding not escaping the inserted strings.
First, use a cursor to process the results of a single query, rather than performing a separate query for each row.
Then concatenate the VALUES
lists into strings that you execute using PREPARE
and EXECUTE
.
My code does the inserts in batches of 100 customers, because I expect there's a limit on the size of a query.
BEGIN
DECLARE product_id BIGINT(20);
DECLARE customer_id BIGINT(20);
DECLARE product_name_a VARCHAR(500);
DECLARE product_name_b VARCHAR(500);
DECLARE product_name_c VARCHAR(500);
DECLARE done INT DEFAULT FALSE;
DECLARE cur CURSOR FOR SELECT c.id, c.product_name_a, c.product_name_b, c.product_name_c FROM customer AS c;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
SET product_id = 1;
OPEN cur;
SET @product_values = '';
SET @cp_values = '';
read_loop: LOOP
FETCH cur INTO customer_id, product_name_a, product_name_b, product_name_c;
IF done THEN
LEAVE read_loop;
END IF;
SET @product_values = CONCAT(@product_values, IF(@product_values != '', ',', ''), "(", product_id, ",'", product_name_a, "'), (", product_id + 1, ",'", product_name_b, "'), (", product_id + 2, ",'", product_name_c, "'), ");
SET @cp_values = CONCAT(@cp_values, IF(@cp_values != '', ',', ''), "(", customer_id, ",", product_id, "), (", customer_id, ",", product_id + 1, "), (", customer_id, ",", product_id + 2, "),");
SET product_id = product_id + 3;
IF product_id % 300 = 1 -- insert every 100 customers
THEN BEGIN
SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
PREPARE stmt1 FROM @insert_product;
EXECUTE stmt1;
SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
PREPARE stmt2 FROM @insert_cp;
EXECUTE stmt2;
SET @product_values = '';
SET @cp_values = '';
END IF;
END LOOP;
IF @product_values != '' -- Process any remaining rows
THEN BEGIN
SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
PREPARE stmt1 FROM @insert_product;
EXECUTE stmt1;
SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
PREPARE stmt2 FROM @insert_cp;
EXECUTE stmt2;
SET @product_values = '';
SET @cp_values = '';
END IF;
END;
Beware that, using this solution, the product names will not be properly escaped before inserting. This solution will therefore not work if any of the product names contains special characters, such as single quote '
.