DB is BigQuery, but it's very similar to most DB's, and I would imagine this answer will be universal.
The Table
user_id | date | category_id
----------------------------
1 | xx | 10
2 | xx | 10
2 | xx | 10
3 | xx | 10
3 | xx | 10
3 | xx | 10
1 | xx | 11
2 | xx | 12
I would like to get a count of distinct user_id's per category_id
So the answer would produce :
category_id | distinct_user_count
---------------------------------
10 | 3
11 | 1
12 | 1
My apologies ahead of time if this has been asked before.
I should add that both of these SQL queries return the same result :
SELECT
category_id,
count(distinct user_id)
FROM t
group by category_id
SELECT
category_id,
count(user_id) over(partition by category_id)
FROM t
group by category_id, user_id
Produces unexpected results :
category_id | distinct_user_count
---------------------------------
10 | 1
11 | 1
12 | 1
Try this:
select distinct count(user_id) over(partition by category_id) as "users_per_id",
category_id
from the_table
group by category_id, user_id