I have a list of different channels that could potentially bring users to a website (organic, SEO, online marketing, etc.). I would like to find an efficient way to count daily active user that comes from the combination of these channels. Each channel has its own table and track its respective users.
The tables looks like the following,
channel A
date user_id
2020-08-01 A
2020-08-01 B
2020-08-01 C
channel B
date user_id
2020-08-01 C
2020-08-01 D
2020-08-01 G
channel C
date user_id
2020-08-01 A
2020-08-01 C
2020-08-01 F
I want to know the following combinations
However, when there are a lot of channels (I have around 8 channels) the combination is a lot. What I've done roughly is as simple as this (this one includes channel A)
SELECT
a.date,
COUNT(DISTINCT IF(b.user_id IS NULL AND c.user_id IS NULL, a.user_id, NULL)) AS dau_a,
COUNT(DISTINCT IF(b.user_id IS NOT NULL AND c.user_id IS NULL, a.user_id, NULL)) AS dau_a_b,
...
FROM a LEFT JOIN b ON a.user_id = b.user_id AND a.date = b.date
LEFT JOIN c ON a.user_id = c.user_id AND a.date = c.date
GROUP BY 1
but extremely tedious when the total channels is 8 (28 variations for 2 combinations, 56 for 3, 70 for 4, and many more).
Any smart ideas to solve this? I was thinking to use FULL OUTER JOIN
but can't seem to get the grasp out of it. Answers really appreciated.
I would approach this with union all
and two levels of aggregation:
select date, channels, count(*) as num_users
from (select date, user_id, string_agg(channel order by channel) as channels
from ((select distinct date, user_id, 'a' as channel from a) union all
(select distinct date, user_id, 'b' as channel from b) union all
(select distinct date, user_id, 'c' as channel from c)
) abc
group by date, user_id
) c
group by date, channels;