Search code examples
sqljoinmariadbcoalescesql-rank

Issues selecting rows with next row's IDs


This question is a follow-up to Select first rank if RANK()+1 overflows with JOINS

The scenario

I have a bunch of predefined messages that my Discord and Twitch bots send to dedicated channels every 30 minutes in round-robin only if X messages where sent by other users between each of the bot's messages. The round-robin is independant from one channel to another.

SQL tables

CREATE TABLE `loop_msg` (
  `msg_id` int(11) NOT NULL,
  `content` varchar(2000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `loop_msg` (`msg_id`, `content`) VALUES
(2, 'Content 2'),
(3, 'Content 3'),
(4, 'Content 4'),
(6, 'Content 6'),
(7, 'Content 7'),
(8, 'Content 8');

CREATE TABLE `loop_msg_status` (
  `channel_id` bigint(20) NOT NULL,
  `msg_id` int(11) NOT NULL DEFAULT 0,
  `inbetween` int(11) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `loop_msg_status` (`channel_id`, `msg_id`, `inbetween`) VALUES
(123456789012345671, 2, 10),
(123456789012345672, 4, 30),
(123456789012345673, 6, 10),
(123456789012345674, 6, 0),
(123456789012345675, 6, 15),
(123456789012345676, 8, 10);

ALTER TABLE `loop_msg`
  ADD PRIMARY KEY (`msg_id`);

ALTER TABLE `loop_msg_status`
  ADD PRIMARY KEY (`channel_id`);

ALTER TABLE `loop_msg`
  MODIFY `msg_id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=9;
COMMIT;

The current query

Now, I need every 30 minutes to get every channel_id with inbetween >= 10 from msg_loop_status, the next message's content from loop_msg and the next msg_id from loop_msg as well since after posting the new message, I will UPDATE that channel_id's row with the new msg_id and set inbetween to 0. More specifically, if we're at the end of loop_message (with the highest msg_id), we need to restart the round-robin from the beginning (the lowest msg_id).

Here is what I currently have working:

WITH cte AS (
        SELECT loop_msg.msg_id
             , channel_id
             , inbetween
             , COALESCE(
                           LEAD(loop_msg.msg_id)        OVER w1
                         , FIRST_VALUE(loop_msg.msg_id) OVER w1
                       ) AS msg_id_next
             , COALESCE(
                           LEAD(content)        OVER w1
                         , FIRST_VALUE(content) OVER w1
                       ) AS content_next
          FROM      loop_msg
          LEFT JOIN loop_msg_status
            ON loop_msg.msg_id = loop_msg_status.msg_id
          WINDOW w1 AS (ORDER BY loop_msg.msg_id)
     )
SELECT channel_id, content_next, msg_id, msg_id_next
  FROM cte
 WHERE channel_id IS NOT NULL AND inbetween >= 10
;

The Fiddle

The problem

The problem

When multiples channels are at the same place (same msg_id for different channel_id in loop_msg_status), the ordering doesn't work correctly and the values for next_content and next_msg_id are the current values.

Expectations

Back to the basics: I have 1 channel_id in loop_msg_status where I send content from loop_msg in roundrobin every 30 minutes if inbetween >= 10. I store the msg_id of the sent content in loop_msg_status for that channel, and 30 minutes later, I query again to see where I'm at and post the next content. For example take this row:

channel_id: 123456789012345672
msg_id: 4
inbetween: 30

When doing my SELECT in msg_loop_status, since inbetween >= 10, this channel_id should be selected with the next msg_id that comes after 4 and its corresponding content from msg_loop. Because there isn't a msg_id=5, the next one is actually 6. So the output would be like this:

channel_id: 123456789012345672
next_msg_id: 6
next_content: Content 6

Applying that same logic to the whole example, I'm expecting the following output (the colums such as the current msg_id and content are not needed but can be added for the sake of the example for readability). 5 of the 6 channels have inbetween >= 10, so for each of these channels I need their next_msg_id and next_content as such:

channel_id         | msg_id | next_msg_id | next_content
-----------------------------------------------
123456789012345671 | 2      | 3           | Content 3
123456789012345672 | 4      | 6           | Content 6
123456789012345673 | 6      | 7           | Content 7
123456789012345675 | 6      | 7           | Content 7
123456789012345676 | 8      | 2           | Content 2

Solution

  • Updated to reflect your latest detail, where the next message id should be based on the unique / ordered list of message rows, without regard to the status table.

    The fiddle

    WITH msgs AS (
            SELECT msg_id, content
                 , COALESCE(
                               LEAD(loop_msg.msg_id)        OVER w1
                             , FIRST_VALUE(loop_msg.msg_id) OVER w1
                           ) AS msg_id_next
                 , COALESCE(
                               LEAD(content)        OVER w1
                             , FIRST_VALUE(content) OVER w1
                           ) AS content_next
              FROM      loop_msg
            WINDOW w1 AS (ORDER BY loop_msg.msg_id)
         )
       , cte AS (
            SELECT loop_msg.msg_id
                 , channel_id
                 , inbetween
                 , msg_id_next
                 , content_next
              FROM      msgs AS loop_msg
              LEFT JOIN loop_msg_status
                ON loop_msg.msg_id = loop_msg_status.msg_id
         )
    SELECT channel_id, content_next, msg_id, msg_id_next
      FROM cte
     WHERE channel_id IS NOT NULL AND inbetween >= 10
    ;
    

    The result:

    The result

    The outer join probably is no longer required:

    WITH msgs AS (
            SELECT msg_id, content
                 , COALESCE(
                               LEAD(loop_msg.msg_id)        OVER w1
                             , FIRST_VALUE(loop_msg.msg_id) OVER w1
                           ) AS msg_id_next
                 , COALESCE(
                               LEAD(content)        OVER w1
                             , FIRST_VALUE(content) OVER w1
                           ) AS content_next
              FROM      loop_msg
            WINDOW w1 AS (ORDER BY loop_msg.msg_id)
         )
       , cte AS (
            SELECT loop_msg.msg_id
                 , channel_id
                 , inbetween
                 , msg_id_next
                 , content_next
              FROM msgs AS loop_msg
              JOIN loop_msg_status
                ON loop_msg.msg_id = loop_msg_status.msg_id
         )
    SELECT channel_id, content_next, msg_id, msg_id_next
      FROM cte
     WHERE inbetween >= 10
    ;
    

    and finally:

    WITH msgs AS (
            SELECT msg_id, content
                 , COALESCE(
                               LEAD(loop_msg.msg_id)        OVER w1
                             , FIRST_VALUE(loop_msg.msg_id) OVER w1
                           ) AS msg_id_next
                 , COALESCE(
                               LEAD(content)        OVER w1
                             , FIRST_VALUE(content) OVER w1
                           ) AS content_next
              FROM      loop_msg
            WINDOW w1 AS (ORDER BY loop_msg.msg_id)
         )
    SELECT channel_id, content_next, loop_msg.msg_id, msg_id_next
      FROM msgs AS loop_msg
      JOIN loop_msg_status
        ON loop_msg.msg_id = loop_msg_status.msg_id
     WHERE inbetween >= 10
    ;