Search code examples
sqlgoogle-bigquerycountgroup

Best way to count a distinct row grouped by a different row?


DB is BigQuery, but it's very similar to most DB's, and I would imagine this answer will be universal.

The Table

user_id | date | category_id 
----------------------------
1       | xx   | 10
2       | xx   | 10
2       | xx   | 10
3       | xx   | 10
3       | xx   | 10
3       | xx   | 10
1       | xx   | 11
2       | xx   | 12

I would like to get a count of distinct user_id's per category_id

So the answer would produce :

category_id | distinct_user_count
---------------------------------
10          | 3
11          | 1
12          | 1

My apologies ahead of time if this has been asked before.

I should add that both of these SQL queries return the same result :

SELECT 
  category_id,
  count(distinct user_id)
FROM t
group by category_id
SELECT 
  category_id,
  count(user_id) over(partition by category_id)
FROM t
group by category_id, user_id

Produces unexpected results :

category_id | distinct_user_count
---------------------------------
10          | 1
11          | 1
12          | 1

Solution

  • Try this:

    select  distinct count(user_id) over(partition by category_id) as "users_per_id", 
      category_id
    from the_table
    group by category_id, user_id