sqlmysqljoinselectgroup-by

MySQL Joining same table multiple times makes count go weird


Edited to clarify and simplify

I have the following query

select 
  Signups.user_id,
  Count(timeoutTable.user_id) as timeoutCount,
  Count(confirmedTable.user_id) as confirmedCount
from Signups
left join (
  select * from Confirmations where action = 'timeout'
) as timeoutTable on signups.user_id = timeoutTable.user_id
left join (
  select * from Confirmations where action = 'confirmed'
) as confirmedTable on signups.user_id = confirmedTable.user_id
group by 
  Signups.user_id,
  timeoutTable.user_id,
  confirmedTable.user_id

When run with the following input

Signups Table

| user_id | time_stamp          |
| ------- | ------------------- |
| 15      | 2020-07-31 18:26:35 |
| 16      | 2021-05-20 01:38:09 |
| 7       | 2020-08-02 08:45:14 |
| 10      | 2020-06-24 17:13:14 |
| 5       | 2020-06-27 17:59:29 |
| 9       | 2021-11-08 03:05:14 |
| 8       | 2021-12-13 03:38:58 |
| 12      | 2020-09-16 11:17:39 |


Confirmations Table

| user_id | time_stamp          | action    |
| ------- | ------------------- | --------- |
| 7       | 2020-03-31 13:11:43 | timeout   |
| 7       | 2021-03-25 07:40:25 | timeout   |
| 8       | 2020-07-27 19:43:25 | confirmed |
| 8       | 2021-03-07 19:48:06 | timeout   |
| 7       | 2020-01-24 15:43:47 | confirmed |

It outputs:

| user_id | timeoutCount | confirmedCount |
| ------- | ------------ | -------------- |
| 15      | 0            | 0              |
| 16      | 0            | 0              |
| 7       | 2            | 2              |
| 10      | 0            | 0              |
| 5       | 0            | 0              |
| 9       | 0            | 0              |
| 8       | 1            | 1              |
| 12      | 0            | 0              |

I am trying to make it so that user_id 7 has a confirmed of 1 and a timeout of 2 but it sets both values to 2 for some reason. Any help would be greatly appreciated.


Solution

  • Your problem is that you are joining all timeouts to all confirmations. So, for a signup with 2 timeouts and 3 confirmations, you'll get 2 x 3 = 6 rows that you then aggregate.

    When aggregating more than one table, aggregate before joining:

    select 
      user_id, 
      round(case when total_messages = 0 then 0 else total_confirmed / total_messages end, 2) as confirmation_rate
    from
    (
      select 
        s.user_id, 
        t.total_timeouts,
        c.total_confirmed,
        coalesce(t.timeoutCount,0) + coalesce(c.total_confirmed, 0) as total_messages
      from signups s
      left join 
      (
        select user_id, count(*) as total_timeouts
        from confirmations
        where action = 'timeout'
        group by user_id
      ) as t on t.user_id = s.user_id
      left join 
      (
        select user_id, count(*) as total_confirmed
        from confirmations
        where action = 'confirmed'
        group by user_id
      ) as c on c.user_id = s.user_id
    )
    order by user_id;
    

    An alternative to this is conditional aggregation here:

    select 
      user_id, 
      round(case when total_messages = 0 then 0 else total_confirmed / total_messages end, 2) as confirmation_rate
    from
    (
      select 
        s.user_id, 
        sum(c.action = 'timeout') as total_timeouts,
        sum(c.action = 'confirmed') as total_confirmed,
        sum(c.action = 'timeout') + sum(c.action = 'confirmed') as total_messages
      from signups s
      left join confirmations c on c.user_id = s.user_id
      group by s.user_id
    )
    order by user_id;