Search code examples
mysqlgroup-bymaxdate

MySQL Max of a Date not returning the correct tuple


I have a table "messages", that stores messages sent to people over time, regarding some items.

The structure of the messages table is:

message_id user_id date_sent created_at

For each user, I can have multiple tuples in the table. Some of these messages are already sent, and some are not sent yet.

I'm trying to get the last created message for each user. I'm using max(created_at) and a group_by(user_id), but the associated message_id is not the one associated with the max(created_id) tuple.

Table data:

   message_id | user_id | date_sent | created_at
----------------------------------------------
    1           1       2021-07-01  2021-07-01        
    2           1       2021-07-02  2021-07-02
    3           2       2021-07-01  2021-07-01
    4           3       2021-07-04  2021-07-04
    5           1       2021-07-22  2021-07-22
    6           1       NULL        2021-07-23
    7           2       NULL        2021-07-29
    8           1       NULL        2021-07-29
    9           3       2021-07-29  2021-07-29

My Select:

select * from messages ma right join 
( SELECT max(mb.created_at), message_id
FROM `messages` mb WHERE mb.created_at <= '2021-07-24' 
group by user_id)
mc on ma.message_id=mc.message_id

the result is

   message_id | user_id | date_sent | created_at
----------------------------------------------      
    5           1       2021-07-22  2021-07-23
    3           2       2021-07-01  2021-07-01
    4           3       2021-07-04  2021-07-04

I don't know why but for user 1, the message_id returned is not the one associated with the tuple that has the max(created_at).

I was expecting to be: (get the tuple with the max(date_sent) of the select grouped by user_id)

   message_id | user_id | date_sent | created_at
----------------------------------------------      
    6           1       NULL        2021-07-23
    3           2       2021-07-01  2021-07-01
    4           3       2021-07-04  2021-07-04

Any idea? Any help? thank you.


Solution

  • You're stumbling over MySQL's notorious nonstandard extension to GROUP BY. It gives you the illusion you can do things you can't. Example

     SELECT max(created_at), message_id
       FROM messages 
      GROUP BY user_id
    

    actually means

     SELECT max(created_at), ANY_VALUE(message_id)
       FROM messages 
      GROUP BY user_id
    

    where ANY_VALUE() means MySQL can choose any message_id it finds most convenient from among that user's messages. That's not what you want.

    To solve your problem, you need first to use a subquery to find the latest created_at date for each user_id. Fiddle.

                  SELECT user_id, MAX(created_at) created_at
                    FROM messages
                   WHERE created_at <= '2021-07-24'
                   GROUP BY user_id
    

    Then, you need to find the message for the particular user_id created on that date. Use the subquery for that. Fiddle

      SELECT a.*
        FROM messages a
        JOIN (
                  SELECT user_id, MAX(created_at) created_at
                    FROM messages
                   WHERE created_at <= '2021-07-24'
                   GROUP BY user_id
             ) b ON a.user_id = b.user_id AND a.created_at = b.created_at
    

    See how that JOIN works? It pulls out the rows matching the latest date for each user.

    There's a possible optimization. If

    • your message_id is an autoincrementing primary key and
    • you never UPDATE your created_at columns, but only set them to the current date when you INSERT the rows

    then the most recent message for each user_id is also the message with the largest message_id. In that case you can use this query instead. Fiddle

      SELECT a.*
        FROM messages a
        JOIN (
                  SELECT user_id, MAX(message_id) message_id
                    FROM messages
                   WHERE created_at <= '2021-07-24'
                   GROUP BY user_id
             ) b ON a.message_id=b.message_id
    

    Due to the way primary key indexes work, this can be faster.

    You want an ordinary JOIN rather than a RIGHT or LEFT JOIN here: the ordinary JOIN only returns rows that match the ON condition.

    Pro tip almost nobody actually uses RIGHT JOIN. When you want that kind of JOIN, use LEFT JOIN. You don't want that kind of join to solve this problem.