Search code examples
mysqlsql-updateinner-jointemp-tables

MySQL update updating more rows than it should be


I'm having issues where an update statement should (to my knowledge) update 5 rows (I have selected 5 rows into a temp table and used an INNER JOIN in the update statement)

however when it comes to running the update statement it updates anything that could have been selected into the temp table not just the joined contents of the temp table itself.

I'm using the FOR UPDATE code in the selection statement to lock the rows, (as I'm expecting multiple queries to be aimed at this table at one time, (NOTE removing that does not change the error effect)

I've generalised the entire code base and it still has the same effect, I've been on this for the past few days and I'm sure that its just something silly I must be doing

Code description

TABLE `data`.`data_table Table to store data and to show it has been taken my the external program.

Stored Procedure `admin`.`func_fill_table debug code to populate the above table.

Stored Procedure `data`.`func_get_data Actual code designed to retrieve a batch size records, mark them as picked up and then return them to an external application.

Basic Setup Code

DROP TABLE IF EXISTS `data`.`data_table`;
DROP PROCEDURE IF EXISTS `admin`.`func_fill_table`;
DROP PROCEDURE IF EXISTS `data`.`func_get_data`;
DROP SCHEMA IF EXISTS `data`;
DROP SCHEMA IF EXISTS `admin`;
CREATE SCHEMA `admin`;
CREATE SCHEMA `data`;

CREATE TABLE `data`.`data_table` (
  `identification_field_1` char(36) NOT NULL,
  `identification_field_2` char(36) NOT NULL,
  `identification_field_3` int(11) NOT NULL,
  `information_field_1` int(11) NOT NULL,
  `utc_action_time` datetime NOT NULL,
  `utc_actioned_time` datetime DEFAULT NULL,
  PRIMARY KEY (`identification_field_1`,`identification_field_2`,`identification_field_3`),
  KEY `NC_IDX_data_table_action_time` (`utc_action_time`)
);

Procedure Creation

DELIMITER //

CREATE PROCEDURE `admin`.`func_fill_table`(
    IN records int
)
BEGIN
    IF records < 1
        THEN SET records = 50;
    END IF;

    SET @processed = 0;
    SET @action_time = NULL;

    WHILE @processed < records
    DO
        SET @action_time = DATE_ADD(now(), INTERVAL FLOOR(RAND()*(45)-10) MINUTE);#time shorter for temp testing
        SET @if_1 = UUID();
        SET @if_2 = UUID();
        INSERT INTO data.data_table(
            identification_field_1
            ,identification_field_2
            ,identification_field_3
            ,information_field_1
            ,utc_action_time
            ,utc_actioned_time)
        VALUES (
             @if_1
            ,@if_2
            ,FLOOR(RAND()*5000+1)
            ,FLOOR(RAND()*5000+1)
            ,@action_time
            ,NULL);

        SET @processed = @processed +1;
    END WHILE;
END
//
CREATE PROCEDURE `data`.`func_get_data`(
    IN batch int
)
BEGIN
    IF batch < 1
        THEN SET batch = 1; /*Minimum Batch Size of 1 */
    END IF;

    DROP TABLE IF EXISTS `data_set`;
    CREATE TEMPORARY TABLE `data_set` 
        SELECT 
            `identification_field_1` as `identification_field_1_local`
            ,`identification_field_2` as `identification_field_2_local`
            ,`identification_field_3` as `identification_field_3_local`
        FROM `data`.`data_table`
        LIMIT 0; /* Create a temp table using the same data format as the table but insert no data*/

    SET SESSION sql_select_limit = batch;

    INSERT INTO `data_set` (
        `identification_field_1_local`
        ,`identification_field_2_local` 
        ,`identification_field_3_local`)
    SELECT 
        `identification_field_1`
        ,`identification_field_2` 
        ,`identification_field_3` 
     FROM `data`.`data_table`
     WHERE
        `utc_actioned_time` IS NULL
            AND `utc_action_time` < NOW()
    FOR UPDATE; #Select out the rows to process (up to batch size (eg 5)) and lock those rows

    UPDATE 
       `data`.`data_table` `dt`
    INNER JOIN 
        `data_set` `ds`
        ON (`ds`.`identification_field_1_local` = `dt`.`identification_field_1`
            AND `ds`.`identification_field_2_local` = `dt`.`identification_field_2`
            AND `ds`.`identification_field_3_local` = `dt`. `identification_field_3`)
    SET `dt`.`utc_actioned_time` = NOW();
    # Update the table to say these rows are being processed

    select ROW_COUNT(),batch;
    #Debug output for rows altered (should be maxed by batch number)

    SELECT * FROM 
        `data`.`data_table` `dt`
    INNER JOIN 
        `data_set` `ds`
        ON (`ds`.`identification_field_1_local` = `dt`.`identification_field_1`
            AND `ds`.`identification_field_2_local` = `dt`.`identification_field_2`
            AND `ds`.`identification_field_3_local` = `dt`. `identification_field_3`);
    # Debug output of the rows that should have been modified

    SELECT 
       `identification_field_1_local`
        ,`identification_field_2_local` 
        ,`identification_field_3_local`
    FROM 
        `data_set`; /* Output data to external system*/

    /* Commit the in process field and allow other processes to access thoese rows again */
END;
//

Run Code

call `admin`.`func_fill_table`(5000);
call `data`.`func_get_data`(5);

Solution

  • You are misusing the sql_select_limit-setting:

    The maximum number of rows to return from SELECT statements.

    It only applies to select statements (to limit results sent to the client), not to insert ... select .... It is intended as a safeguard to prevent users to be accidentally flooded with millions of results, not as another limit function.

    While in general, you cannot use a variable for limit, you can do it in a stored procedure (for MySQL 5.5+):

    The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants, with these exceptions: [...]

    • Within stored programs, LIMIT parameters can be specified using integer-valued routine parameters or local variables.

    So in your case, you can simply use

    ...     
    FROM `data`.`data_table`
    WHERE `utc_actioned_time` IS NULL AND `utc_action_time` < NOW()
    LIMIT batch
    FOR UPDATE;