I have two tables (simplified to):
+----------------+
| attendances |
+-----+----------+
| int | user_id |
+-----+----------+
| int | event_id |
+-----+----------+
+-------------------------+
| events |
+------+------------------+
| int | id |
+------+------------------+
| date | performance_date |
+------+------------------+
And a simple query:
SELECT count(DISTINCT user_id), events.performance_date
FROM attendances
INNER JOIN events
ON event_id = events.id
GROUP BY performance_date
I only wish to count each user_id
once, but the above query only removes the duplicates from each performance_date
(allowing them to be duplicated across multiple dates).
Is there a query that can remove duplicate user_ids from the entire result set, and only include the first occurence (date wise)? I'm suspecting it might not be possible.
If a user attended an event on 2010-10-10 and again on 2010-10-11, then the results would be:
1, 2010-10-10
Not:
1, 2010-10-10
1, 2010-10-11
Or:
2, 2010-10-10
If another user was added to the above, and they attended on 2010-10-10 and on 2010-10-12, then the results would be:
2, 2010-10-10
1, 2020-10-12
As I say, this may not be possible. The actual output isn't strictly important -- just so long as the unique number of people who attended a particular performance can be derived somehow.
The data will be used to construct a cumulative graph of the growth in the number of unique users by event.
If you want the earliest date per user, you can use aggregation:
select u.id user_id, min(e.date) first_event_date
from users u
inner join events e on u.event_id = e.id
group by u.id
Actually, you might be looking for histogram, that is the number of users per their earliest event date. You can do this by adding another level of aggregation:
select first_event_date, count(*) no_users
from (
select min(e.date) first_event_date
from users u
inner join events e on u.event_id = e.id
group by u.id
) t
group by first_event_date