I have a complicated case that I am simplifying to a MVCE example. Say we have a MySQL table containing a bunch of clubs and many students. Each student can join multiple clubs and they can also leave at any time.
For MVCE purposes, let's consider the table below:
/* Table: membership */
Club Student Date Status
ABC Alice 1/1/2020 Joined
ABC Bob 1/1/2020 Joined
ABC Charlie 1/2/2020 Joined
XYZ Charlie 1/2/2020 Joined
ABC Alice 1/6/2020 Left
ABC Alice 1/12/2020 Joined
ABC Charlie 1/12/2020 Left
XYZ Alice 1/12/2020 Joined
What statement do I use to find all students in club ABC on 1/9/2020
? The answer should be Charlie
and Bob
. If we change the date to 1/12/2020
, the answer should be Alice
and Bob
since Alice
joined and Charlie
left that day.
Here's what I have tried which doesn't return what I want:
SELECT Student
FROM membership
WHERE
Club = "ABC" AND
(SELECT MAX(Date) WHERE Status="Joined" ) > (select MAX(Date) WHERE Status="Left") ;
Also, the above considers the current membership. How would I make it the membership at a particular date?
If you can match your records of "Joined" to the corresponding "Left" or null
if there is none, then you're pretty much done. The logic I'd apply is that records that corresponds to a "Joined" must be by the same "club" and "student". The date is a bit more tricky, since it's possible to have this scenario:
Club Student Date Status
ABC Alice 2020-01-01 Joined
ABC Alice 2020-01-02 Left
ABC Alice 2020-01-03 Joined
ABC Alice 2020-01-04 Left
ABC Alice 2020-01-05 Joined
The corresponding record can be found by the "Left" record that has a date that is greater than the "Joined", but the earliest of those.
Since the "Joined" don't always have a corresponding record, as in the scenario above where there's none for the last "Joined", I've used a self-join using a LEFT JOIN
:
SELECT joined.club,
joined.student,
joined.date AS joindate,
left_.date AS leavedate
FROM Membership AS joined
LEFT JOIN Membership AS left_ ON joined.club = left_.club -- must match
AND joined.student = left_.student -- must match
AND left_.Status = 'Left' -- match to "Left"
-- Only the minimum date of the dates
-- ..greater or equal to "Joined" date
AND left_.date = (SELECT min(date)
FROM Membership min
WHERE 1 = 1
AND min.club = left_.club
AND min.student = left_.student
AND min.date >= joined.date
AND min.Status = 'Left')
WHERE joined.Status = 'Joined'
Resulting in:
club | student | joindate | leavedate |
---|---|---|---|
ABC | Alice | 2020-01-01 | 2020-01-06 |
ABC | Bob | 2020-01-01 | null |
ABC | Charlie | 2020-01-02 | 2020-01-12 |
XYZ | Charlie | 2020-01-02 | null |
ABC | Alice | 2020-01-12 | null |
XYZ | Alice | 2020-01-12 | null |
To get memberships on a date, you just have to add:
AND joined.date <= '2020-01-06'
AND coalesce(left_.date,'9999-12-31') > '2020-01-06'
or
AND '2020-01-12' BETWEEN joined.date
AND coalesce(date_add(left_.date,interval -1 day),'9999-12-31')
Since BETWEEN
is inclusive, I've subtracted a day. I've used coalesce
to ensure that when there's no corresponding "Left", the date compared is always the maximum date.
However, grouping the records with identical "club", "student", and "Joined"-date the above is equivalent to:
SELECT joined.club,
joined.student,
joined.date AS joindate,
min(left_.date) AS leavedate
FROM Membership AS joined
LEFT JOIN Membership AS left_ ON joined.club = left_.club
AND joined.student = left_.student
AND left_.status = 'Left'
AND left_.date >= joined.date
WHERE joined.status = 'Joined'
GROUP BY joined.club, joined.student, joined.date
Adding the date would then instead be using HAVING
since it's on the result:
HAVING '2020-01-12' BETWEEN joined
AND coalesce(date_add(`left`,interval -1 day),'9999-12-31')
I'd advice you to use DATETIME
or TIMESTAMP
instead of just of DATE
, since you're going to get into a pickle with this scenario:
Club Student Date Status
B52 Peter 2020-10-03 Joined
B52 Peter 2020-10-03 Left
B52 Peter 2020-10-03 Joined
Or
Club Student Date Status
B52 Benny 2020-10-01 Joined
B52 Benny 2020-10-03 Left
B52 Benny 2020-10-03 Joined
The "Left" is greater or equal to any of the "Joined", and it will appear as if neither Peter nor Benny is a member on the 3rd and as a result also not on any later date.
You can also resolve the scenario if you include an ID (which I'd advice you to always do for any table!) for your table and check that a "Left" record always came in later than a "Joined" record, instead of checking that left_.date >= joined.date
.
Making the "Left" date strictly greater then the "Joined" seems to fix the issue, but then no one will effectively be able to leave on the same date that they joined.
Also note that the query using GROUP BY
will eliminate the duplicate "Joined" date for Peter.