Search code examples
mysqlmysql-5.6

How to avoid tuples with duplicate primary key values getting inserted when populating table with random data


I need to populate a table with tuples of randomised data for testing purposes. I need to work with MySQL5.6 due to dependency reasons. As MySQL5.6 doesn't have any sequence generator or similar functionality to help populate random data into tables, I was able to gather info to come up with a rudimentary script to do the same job. Here's that script:

DELIMITER $$
CREATE PROCEDURE InsertRandomRows(IN NumRows INT)
    BEGIN
        DECLARE i INT;
        SET i = 1;
        START TRANSACTION;
        WHILE i <= NumRows DO
            INSERT INTO table_name (id, tel_mobile, created_at, age,
                call_time, created_at, fio, address, comment) VALUES (
                ROUND(RAND() * 100000000),
                ROUND(RAND() * 1000000),
                FROM_UNIXTIME(
                UNIX_TIMESTAMP('2010-04-30 14:53:27') 
                + FLOOR(0 + (RAND() * 63072000)) 
                ),  
                ROUND(RAND() * 1000000),
                ROUND(RAND() * 1000000),
                ROUND(RAND() * 1000000),
                LEFT(CONCAT(
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR ))
                ), 100),
                LEFT(CONCAT(
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR ))
                ), 100),
                LEFT(CONCAT(
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR )),
                    MD5( CAST( RAND() AS CHAR ))
                ), 100)
            );
            SET i = i + 1;
        END WHILE;
        COMMIT;
    END$$
DELIMITER ;


CALL InsertRandomRows(100000);

My testing would require populating table so that its size is in the ballpark of 30GB, for which I decided to populate the table with :

CALL InsertRandomRows(10000000);

The table in question has the (id,created_at) pair as primary key. However, after insertion of 1095600 rows(about 10% of the required row count), the query execution stops due to an attempt to insert a row that has the same (id,created_at) pair as one inserted before. Is there any modification to the script that I can do in order to prevent this? Any insights are greatly appreciated.


Solution

  • Adding the following clause helped resolved my error as per @Honeyboy Wilson's comment.

    ........................
                   LEFT(CONCAT(
                        MD5( CAST( RAND() AS CHAR )),
                        MD5( CAST( RAND() AS CHAR )),
                        MD5( CAST( RAND() AS CHAR )),
                        MD5( CAST( RAND() AS CHAR ))
                    ), 100)
                );
                SET i = i + 1;
    
    ON DUPLICATE KEY UPDATE id=id+ROUND(RAND() * 100);
    ........................