I have a table that has user_seq_id and no of days a user was active in the program. I want to understand the drop-off funnel. Like how many users were active on day 0 (100%) and on day 1, 2 and so on.
Input table :
create table test (
user_seq_id int ,
NoOfDaysUserWasActive int
);
insert into test (user_seq_id , NoOfDaysUserWasActive)
values (13451, 2), (76453, 1), (22342, 3), (11654, 0),
(54659, 2), (64420, 1), (48906, 5);
I want Day, ActiveUsers, and % Distribution of these users.
One method doesn't use window functions at all. Just a list of days and aggregation:
select v.day, count(t.user_seq_id),
count(t.user_seq_id) / c.cnt as ratio
from (select 0 as day union all select 1 union all select 2 union all select 3 union all select 4 union all select 5
) v(day) left join
test t
on v.day <= t.NoOfDaysUserWasActive cross join
(select count(*) as cnt from test) c
group by v.day, c.cnt
order by v.day asc;
Here is a db<>fiddle.
The mention of window function suggests that you are thinking:
select NoOfDaysUserWasActive,
sum(count(*)) over (order by NoOfDaysUserWasActive desc) as cnt,
sum(count(*)) over (order by NoOfDaysUserWasActive desc) / sum(count(*)) over () as ratio
from test
group by NoOfDaysUserWasActive
order by NoOfDaysUserWasActive
The problem is that this does not "fill in" the days that are not explicitly in the original data. If that is not an issue, then this should have better performance.