Search code examples
mysqlsqldatabasejoinrdbms

Update rows if total column sum is less than X


I'm trying to figure out how to update rows in the table if their total size less than x.

Here is my setup:

create table test_limit (
       id int not null auto_increment primary key,
       folder varchar(255),
       status varchar(32) DEFAULT 'awaiting',
       size bigint unsigned default 0,
       request_id varchar(32)
)
ENGINE=InnoDB;

insert into test_limit
  (folder, status, size)
values
  ('/tmp/AAA/bar', 'awaiting', 200 ),
  ('/tmp/AAA/bar', 'awaiting', 200 ),
  ('/tmp/AAA/bar', 'awaiting', 200 ),
  ('/tmp/BBB/bar', 'awaiting', 200 ),
  ('/tmp/BBB/bar', 'awaiting', 200 );

I have a table with 5 rows, each row has a size what I want to do is to update a group of rows that:

  • has the same folder value
  • status is not in in_progress or created
  • has total limit of 400

I came up with the following update command:

SET @request_id='bbb';
UPDATE test_limit t1
  JOIN
     ( SELECT folder FROM test_limit WHERE status = 'awaiting' GROUP BY folder limit 1) t2
    ON t1.folder = t2.folder
  LEFT JOIN
     ( SELECT folder FROM test_limit WHERE status IN ('in_progress', 'created') GROUP BY folder limit 1) t3
    ON t1.folder = t3.folder
  JOIN
     ( SELECT id, @total := @total + size AS total  FROM (test_limit, (select @total := 0) t)  WHERE @total < 400 and status='awaiting') t4
    ON t1.id=t4.id
  SET t1.status = 'in_progress',
      t1.request_id = @request_id
  WHERE t1.status = 'awaiting' AND t3.folder is NULL;

But the problem is it is working first time, but doesn't work any other times:

mysql> select * from test_limit;
+----+--------------+-------------+------+------------+
| id | folder       | status      | size | request_id |
+----+--------------+-------------+------+------------+
|  1 | /tmp/AAA/bar | in_progress |  200 | bbb        |
|  2 | /tmp/AAA/bar | in_progress |  200 | bbb        |
|  3 | /tmp/AAA/bar | awaiting    |  200 | NULL       |
|  4 | /tmp/BBB/bar | awaiting    |  200 | NULL       |
|  5 | /tmp/BBB/bar | awaiting    |  200 | NULL       |
+----+--------------+-------------+------+------------+
5 rows in set (0.07 sec)

UPDATE:

The above result is correct for the first run. What I want to achieve in the second run (say request_id = 'aaa' ):

mysql> select * from test_limit;
+----+--------------+-------------+------+------------+
| id | folder       | status      | size | request_id |
+----+--------------+-------------+------+------------+
|  1 | /tmp/AAA/bar | in_progress |  200 | bbb        |
|  2 | /tmp/AAA/bar | in_progress |  200 | bbb        |
|  3 | /tmp/AAA/bar | awaiting    |  200 | NULL       |
|  4 | /tmp/BBB/bar | in_progress |  200 | aaa        |
|  5 | /tmp/BBB/bar | in_progress |  200 | aaa        |
+----+--------------+-------------+------+------------+
5 rows in set (0.07 sec)

And in the third run it should not update anything because all the values are "in_progress".

How can I achieve this?


Solution

  • got it took me a while to think through the logic. Here is the sql fiddle http://sqlfiddle.com/#!9/227dd0/1

    UPDATE test_limit u
    JOIN
    (
      SELECT
        t1.*
        ,f.NonAwaitingFolderTotal
        ,(@runtot := @runtot + t1.size) as RunningTotal
      FROM
        (
          SELECT
            folder
            ,SUM(CASE WHEN status <> 'awaiting' THEN size ELSE 0 END) as NonAwaitingFolderTotal
          FROM
            test_limit t
          GROUP BY
            folder
          HAVING
            SUM(CASE WHEN status <> 'awaiting' THEN size ELSE 0 END) <= 400
          ORDER BY
            NonAwaitingFolderTotal, folder
          LIMIT 1
        ) f
        INNER JOIN test_limit t1
        ON f.folder = t1.folder
        CROSS JOIN (SELECT @runtot:=0) var
      WHERE
        t1.status = 'awaiting'
    )  t2
    ON u.id = t2.id
    AND (t2.NonAwaitingFolderTotal + t2.RunningTotal) <= 400
    SET
      u.status = 'in_progress'
      ,u.request_id = @request_id
    ;
    

    The logic goes like this

    • find out folder to use and find the Non Awaiting Total Size that is currently in that folder. Then select a folder by the lowest non awaiting size (in_progress,created) and if tied by folder name and then limit 1.
    • Get a Running Total of All awaiting records in that folder to be used to determine which rows can be updated before hitting the max allowed.
    • Do the update with a join to the results of the running total query where Total Size of the NonAwaiting records + the running total of that record are less than the 400 maximum.

    And just because I want to keep this around somewhere the main issue was the running total you where using wasn't grouped by the right level. here are a few running total & row number functions I worked through thinking about it.

    SELECT 
      *
      ,(@foldercount := IF(@prevfolder=folder,@foldercount,@foldercount+1)) as FolderNum
      ,(@rownum := @rownum + 1) as RowNum
      ,(@grouprownum := IF(@prevfolder=folder,@grouprownum+1,1)) as GroupRowNum
      ,(@total := IF(@prevfolder=folder,@total + t.size,t.size)) as GroupRunningTotal
      ,(@GroupAwaitRunningTotal := IF(
            @prevfolder=folder
            ,IF(t.status = 'awaiting',@GroupAwaitRunningTotal + t.size,@GroupAwaitRunningTotal)
            ,IF(t.status = 'awaiting',t.size,0)
          )
       ) as GroupAwaitRunningTotal
       ,(@GroupNonAwaitRunningTotal := IF(
            @prevfolder=folder
            ,IF(t.status != 'awaiting',@GroupNonAwaitRunningTotal + t.size,@GroupNonAwaitRunningTotal)
            ,IF(t.status != 'awaiting',t.size,0)
          )
       ) as GroupNonAwaitRunningTotal
      ,(@runtot := @runtot + t.size) as RunningTotal
      ,@prevfolder:=folder
    FROM 
      test_limit t
      CROSS JOIN
      (SELECT @prevfolder:=NULL, @GroupAwaitRunningTotal := 0
         ,@GroupNonAwaitRunningTotal := 0
         ,@total:=0, @rownum:=0, @grouprownum:=0, @runtot:=0, @foldercount:=0) var