Search code examples
mysqldatabaselarge-datamyisamlarge-data-volumes

MySQL - Executing intensive queries on live server


I'm having some issues dealing with updating and inserting millions of row in a MySQL Database. I need to flag 50 million rows in Table A, insert some data from the marked 50 million rows into Table B, then update those same 50 million rows in Table A again. There are about 130 million rows in Table A and 80 million in Table B.

This needs to happen on a live server without denying access to other queries from the website. The problem is while this stored procedure is running, other queries from the website end up locked and the HTTP request times out.

Here's gist of the SP, a little simplified for illustration purposes:

CREATE DEFINER=`user`@`localhost` PROCEDURE `MyProcedure`(  
  totalLimit  int
)
BEGIN
  SET @totalLimit = totalLimit; 
  /* Prepare new rows to be issued */
  PREPARE STMT FROM 'UPDATE tableA SET `status` = "Being-Issued" WHERE `status` = "Available" LIMIT ?';
  EXECUTE STMT USING @totalLimit;
  /* Insert new rows for usage into tableB */
  INSERT INTO tableB (/* my fields */)
    SELECT /* some values from TableA */ 
    FROM tableA
    WHERE `status` = "Being-Issued";
  /* Set rows as being issued */
  UPDATE tableB SET `status` = 'Issued' WHERE `status` = 'Being-Issued';
END$$

DELIMITER ;

Solution

  • What about this? It basically calls the original stored procedure in a loop until the total amount needed is reached, and having a sleep period in between calls (like 2 seconds) to allow other queries to process.

    increment is the amount to do at one time (using 10,000 in this case)
    totalLimit is the total amount to be processed
    sleepSec is the amount of time to rest between calls

    BEGIN
    SET @x = 0;
    REPEAT
        SELECT SLEEP(sleepSec);
        SET @x = @x + increment;
        CALL OriginalProcedure( increment );
    
        UNTIL @x >= totalLimit
    END REPEAT;
    END$$
    

    Obviously it could use a little math to make sure the increment doesn't go over the total limit if its not evenly divisible, but it appears to work (by work I mean allow other queries to still be processed from web requests), and seems to be faster overall as well.

    Any insight here? Is this a good idea? Bad idea?