Search code examples
mysqlgroup-bydistinct

Count only first unique combination in grouped query


I have a table that looks like this

| date       | user_id | event_id | message_id |
|------------|---------|----------|------------|
| 2021-08-04 | 1       | 1        | 1          |
| 2021-08-04 | 1       | 1        | 2          |
| 2021-08-04 | 1       | 2        | 3          |
| 2021-08-04 | 2       | 1        | 4          |
| 2021-08-05 | 1       | 1        | 1          |
| 2021-08-05 | 2       | 2        | 5          |

I want to group everything by user_id, date and event. But here is the question: I want to count unique combinations of (date-user-event-message) and only add it to the date row, where it occures first. In other words, if I have the same message_id, same user_id and same event_id but different date, I want to count it just once and add to the date-user-event row when this message occured first. So this is what I want to get:

| date       | user_id | event_id | count | count_unique |
|------------|---------|----------|-------|--------------|
| 2021-08-04 | 1       | 1        | 2     | 2            | <--- Unique count is 2 because this is the first date when two unique combinations of user+event+message found
| 2021-08-04 | 1       | 2        | 1     | 1            |
| 2021-08-04 | 2       | 1        | 1     | 1            |
| 2021-08-05 | 1       | 1        | 1     | 0            | <--- Unique count is 0, because this message_id for the same user and event already exists for previous date
| 2021-08-05 | 2       | 2        | 1     | 1            |

It's a bit tricky, and I'm pretty comfindent it's impossible, but I still need to be sure.

I came up with this query:

SELECT
    date,
    user_id,
    event_id,
    COUNT(*) as count,
    COUNT(DISTINCT message_id) as count_unique
FROM events
GROUP BY user_id, event_id, date

But the result I'm getting is obviously not what I'm aiming for:

| date       | user_id | event_id | count | count_unique |
|------------|---------|----------|-------|--------------|
| 2021-08-04 | 1       | 1        | 2     | 2            |
| 2021-08-04 | 1       | 2        | 1     | 1            |
| 2021-08-04 | 2       | 1        | 1     | 1            |
| 2021-08-05 | 1       | 1        | 1     | 1            | <--- Unique count is 1, because it counts distinct message_ids within the group (row).
| 2021-08-05 | 2       | 2        | 1     | 1            |

So basiclly I need to somehow ignore date for count distinct (e.g. count outside the group), and sum the count value only for the rows (groups) where date is the date where this combination found first.


Solution

  • To compute count_unique you want to keep only the first time a message that is send by a user for a event .

    For having this dataset you must execute this query .

    select min(a_date) as date ,userid,event_id,message_id 
        from events 
        group by userid , event_id , message_id
    

    So after this is easy to compute the value count_unique

    select count(*) as count_unique , date , userid , event_id 
        from ( 
           select min(date) as date ,userid,event_id,message_id 
           from events
              group by userid , event_id , message_id ) e 
     group by date , userid , event_id ;
    

    So now you can a left join with a query that count message by userid ,eventid, and date

    select a.*,coalesce(b.count_unique,0) as count_unique 
       from ( 
         select date , userid , event_id , count(*) as cnt  from events 
         group by date , userid , event_id 
      ) a left join (
        select count(*) as count_unique , date , userid , event_id 
           from ( 
              select min(date) as date ,userid,event_id,message_id 
              from events
                 group by userid , event_id , message_id ) e 
           group by date , userid , event_id 
      ) b on a.date=b.date and
            a.userid=b.userid and
            a.event_id = b.event_id;