I'm trying to do something similar to this but in BigQuery. I have several users that may have 1 or more categories. I must find the overlapping within categories. Something like this:
What I want as result is something like this:
That is, for example, only one user has only category D (and no other), two users have categories 10 and 30, and so on.
The main problem is that I have a lot of categories (over 40). Previously I had done something like:
SELECT sum(cat1), sum(cat2), sum(cat3)
FROM table
where cat1 = 0 and cat2 = 1 and cat3 = 0
That way worked, but is too manual and impossible to do it right now because I have lots of categories. Want to use BigQuery if it is possible.
The main problem is that I have a lot of categories (over 40).
Consider below (BigQuery) approach - works for any reasonable amount of categories
execute immediate (
select '''
select * from (
select distinct t1.usr,
t1.categories category, t2.categories category2
from `your_table` t1 left join `your_table` t2
on t1.usr = t2.usr and t1.categories != t2.categories
union all
select usr, any_value(categories) category, any_value(categories) category2
from `your_table`
group by usr
having count(1) = 1
)
pivot (count(usr) cat for category2 in (''' || list || '''))
order by category
'''
from (
select string_agg("'" || categories || "'" order by categories) list
from (select distinct categories from `your_table`)
)
)
if applied to sample data in your question - output is