Search code examples
mysqlsqlinner-joinin-subquery

MySQL: select row only where closest to date has column value


I want to return all rows that were public in May (2019-05), so if a row was turned to draft (and not back to public) at any point before the end of May, I don't want it. For example:

id | post_id | status | date
-------------------------
 1 | 1       | draft  | 2019-03-25
 2 | 1       | public | 2019-04-02
 3 | 1       | draft  | 2019-05-25
 4 | 2       | draft  | 2019-03-10
 5 | 2       | public | 2019-04-01
 6 | 2       | draft  | 2019-06-01

The desired result for the above would return post_id 2 because its last status change prior to the end of May was to public.

post_id 1 was put back in draft before the end of May, so it would not be included.

I'm not sure how to use the correct join or sub-queries to do this as efficiently as possible.


Solution

  • You seem to want the status as of 2019-05-31. A correlated subquery seems like the simplest solution:

    select t.*
    from t
    where t.date = (select max(t2.date)
                    from t t2
                    where t2.post_id = t.post_id and
                          t2.date <= '2019-05-31'
                   );
    

    To get the ones that are public, just add a WHERE condition:

    select t.*
    from t
    where t.date = (select max(t2.date)
                    from t t2
                    where t2.post_id = t.post_id and
                          t2.date <= '2019-05-31'
                   ) and
          t.status = 'public';
    

    For performance, you want an index on (post_id, date).

    You can also phrase this using a JOIN:

    select t.*
    from t join
         (select t2.post_id, max(t2.date) as max_date
          from t t2
          where t2.date <= '2019-05-31'
          group by t2.post_id
         ) t2
         on t2.max_date = t.date
    where t.status = 'public';
    

    I would expect the correlated subquery to have better performance with the right indexes. However, sometimes MySQL surprises me.