Search code examples
mysqlsqlcountmany-to-manydistinct

SQL: count of distinct users with conditions based on many to many table


I have a typical user table in addition to the following feature table

features:

-----------------------
| userId |   feature  |
-----------------------
|   1    |   account  |
|   1    |  hardware  |
|   2    |   account  |
|   3    |   account  |
|   3    |  hardware  |
|   3    |    extra   |
-----------------------

Basically I am trying to get some counts for reporting purposes. In particular, I am trying to find the number of users with accounts and hardware along with the total number of accounts.

I know I can do the following to get the total number of accounts

SELECT 
    COUNT(DISTINCT userId) as totalAccounts
FROM features
WHERE feature = "account";

I am unsure as to how to get the number of users with both accounts and hardware though. In this example dataset, the number I am looking for is 2. Users 1 and 3 have both accounts and hardware.

I would prefer to do this in a single query. Possibly using CASE (example for totalAccounts below):

SELECT
    COUNT(DISTINCT(CASE WHEN feature = "account" THEN userId END)) as totalAccounts,
    COUNT( ? ) as accountsWithHardware
FROM features;

Solution

  • These are two queries - one for the all user count, one for the two-features user count - that you can combine with a cross join:

    select 
      count_all_users.cnt as all_user_count, 
      count_users_having_both.cnt as two_features_user_count
    from
    (
      select count(distinct userid) as cnt
      from features
    ) count_all_users
    cross join
    (
      select count(*) as cnt
      from
      (
        select userid
        from features
        where feature in ('account', 'hardware')
        group by userid
        having count(*) = 2
      ) users_having_both
    ) count_users_having_both;
    

    UPDATE: With some thinking, there is a much easier way. Group by user and detect whether feature 1 and feature 2 exists. Then count.

    select
      count(*) as all_user_count,
      count(case when has_account = 1 and has_hardware = 1 then 1 end)
        as two_features_user_count
    from
    (
      select 
        userid,
        max(case when feature = 'account' then 1 else 0 end) as has_account,
        max(case when feature = 'hardware' then 1 else 0 end) as has_hardware
      from features
      group by userid
    ) users;