Search code examples
mysqlsqlmembership

Get list of members by looking at the their latest status


I have a complicated case that I am simplifying to a MVCE example. Say we have a MySQL table containing a bunch of clubs and many students. Each student can join multiple clubs and they can also leave at any time.

For MVCE purposes, let's consider the table below:

/* Table: membership */
Club    Student     Date        Status
ABC     Alice       1/1/2020    Joined
ABC     Bob         1/1/2020    Joined
ABC     Charlie     1/2/2020    Joined
XYZ     Charlie     1/2/2020    Joined
ABC     Alice       1/6/2020    Left
ABC     Alice       1/12/2020   Joined
ABC     Charlie     1/12/2020   Left
XYZ     Alice       1/12/2020   Joined

What statement do I use to find all students in club ABC on 1/9/2020? The answer should be Charlie and Bob. If we change the date to 1/12/2020, the answer should be Alice and Bob since Alice joined and Charlie left that day.

Here's what I have tried which doesn't return what I want:

SELECT Student
FROM membership
WHERE
Club = "ABC" AND
(SELECT MAX(Date) WHERE Status="Joined" ) > (select MAX(Date) WHERE Status="Left") ;

Also, the above considers the current membership. How would I make it the membership at a particular date?


Solution

  • If you can match your records of "Joined" to the corresponding "Left" or null if there is none, then you're pretty much done. The logic I'd apply is that records that corresponds to a "Joined" must be by the same "club" and "student". The date is a bit more tricky, since it's possible to have this scenario:

    Club    Student     Date          Status
    ABC     Alice       2020-01-01    Joined
    ABC     Alice       2020-01-02    Left
    ABC     Alice       2020-01-03    Joined
    ABC     Alice       2020-01-04    Left
    ABC     Alice       2020-01-05    Joined
    

    The corresponding record can be found by the "Left" record that has a date that is greater than the "Joined", but the earliest of those.

    Since the "Joined" don't always have a corresponding record, as in the scenario above where there's none for the last "Joined", I've used a self-join using a LEFT JOIN:

    SELECT joined.club,
           joined.student,
           joined.date AS joindate,
           left_.date  AS leavedate
      FROM Membership AS joined
      LEFT JOIN Membership AS left_ ON joined.club = left_.club         -- must match
                                   AND joined.student = left_.student   -- must match
                                   AND left_.Status = 'Left'            -- match to "Left"
                                   -- Only the minimum date of the dates
                                   -- ..greater or equal to "Joined" date
                                   AND left_.date = (SELECT min(date)   
                                                       FROM Membership min
                                                      WHERE 1 = 1
                                                        AND min.club = left_.club
                                                        AND min.student = left_.student
                                                        AND min.date >= joined.date 
                                                        AND min.Status = 'Left')
     WHERE joined.Status = 'Joined'
    

    Resulting in:

    club student joindate leavedate
    ABC Alice 2020-01-01 2020-01-06
    ABC Bob 2020-01-01 null
    ABC Charlie 2020-01-02 2020-01-12
    XYZ Charlie 2020-01-02 null
    ABC Alice 2020-01-12 null
    XYZ Alice 2020-01-12 null

    To get memberships on a date, you just have to add:

      AND joined.date <= '2020-01-06'
      AND coalesce(left_.date,'9999-12-31') > '2020-01-06'
    

    or

      AND '2020-01-12' BETWEEN joined.date
                           AND coalesce(date_add(left_.date,interval -1 day),'9999-12-31')
    

    Since BETWEEN is inclusive, I've subtracted a day. I've used coalesce to ensure that when there's no corresponding "Left", the date compared is always the maximum date.

    However, grouping the records with identical "club", "student", and "Joined"-date the above is equivalent to:

    SELECT joined.club,
           joined.student,
           joined.date     AS joindate,
           min(left_.date) AS leavedate
      FROM Membership AS joined
      LEFT JOIN Membership AS left_ ON joined.club = left_.club
                                   AND joined.student = left_.student
                                   AND left_.status = 'Left'
                                   AND left_.date >= joined.date
     WHERE joined.status = 'Joined'
     GROUP BY joined.club, joined.student, joined.date
    

    Adding the date would then instead be using HAVING since it's on the result:

    HAVING '2020-01-12' BETWEEN joined 
       AND coalesce(date_add(`left`,interval -1 day),'9999-12-31')
    

    I'd advice you to use DATETIME or TIMESTAMP instead of just of DATE, since you're going to get into a pickle with this scenario:

    Club    Student     Date          Status
    B52     Peter       2020-10-03    Joined
    B52     Peter       2020-10-03    Left
    B52     Peter       2020-10-03    Joined
    

    Or

    Club    Student     Date          Status
    B52     Benny       2020-10-01    Joined
    B52     Benny       2020-10-03    Left
    B52     Benny       2020-10-03    Joined
    

    The "Left" is greater or equal to any of the "Joined", and it will appear as if neither Peter nor Benny is a member on the 3rd and as a result also not on any later date.

    You can also resolve the scenario if you include an ID (which I'd advice you to always do for any table!) for your table and check that a "Left" record always came in later than a "Joined" record, instead of checking that left_.date >= joined.date.

    Making the "Left" date strictly greater then the "Joined" seems to fix the issue, but then no one will effectively be able to leave on the same date that they joined.

    Also note that the query using GROUP BY will eliminate the duplicate "Joined" date for Peter.


    dbfiddle