I have a Table which tracks User activity (i-e user started a session at what time ?). This table contains the data from Dec 2018 Till now. I need to calculate monthly retention (not based on signup date) based on user Activity( i-e In December 2018, 500 users were active. Then, how many of them were active in Jan,Feb,Mar....Till Now?Same activity should be performed for Jan 2019,Feb 2019... till now users).
I have tried the hard coded way which is to get the users of Dec 2018 in on Table and then Get the Jan 2019 users in other Table and join both tables based on the user_ids, but for that purpose i have to write a lot of joins. Need a dynamic way to check month over month user retention and for all months after Dec 2018(because the data is available from this month onward).
select A.year_month_id,count(distinct A.user_id) as November_Users,count(distinct B.user_id) as December_Retained_Users
FROM (
select date_trunc('month', ua.created_at) as monthly,
ua.user AS user_id
FROM user_activity ua
WHERE ua.event_type='StartSession'
and cast(ua.created_at as date) between cast('20181201' as date) and cast('20181231' as date)
GROUP BY 1,2
) AS A
left Join
(
select date_trunc('month', ua.created_at) as monthly,
ua.user AS user_id
FROM user_activity ua
WHERE ua.event_type='StartSession'
and cast(ua.created_at as date) between cast('20190101' as date) and cast('20190131' as date)
GROUP BY 1,2
) AS B
on A.user_id=B.user_id
group by 1
user_activity Table #
id | user | event_type | created_at
1 | A1 | StartSession | April 29, 2019, 3:59 AM
2 | A2 | StartSession | December 29, 2018, 1:07 AM
3 | A3 | StartSession | December 9, 2018, 4:59 PM
49 | A31 | StartSession | May 25, 2019, 11:59 AM
100| A46 | StartSession | April 29, 2019, 3:56 AM
Expected Output #
Month |Monthly_Active_Users| Jan_Retained|Feb_Retained|Mar_Retained|.......
Dec | 500 | 300 | 200 | 330
Jan | 700 | N/A | 450 | 410
Feb | 1000 | N/A | N/A | 820
Mar | 920 | N/A | N/A | N/A
.
.
.
.
Aug | 100 | N/A | N/A | N/A
I think this will do the job:
with t as (
select distinct user_, to_char(created_at, 'yymm') dt
from user_activity where event_type = 'StartSession'),
u as (
select a.user_, a.dt mth, b.dt dt, count(distinct a.user_) over (partition by a.dt) cnt
from t a join t b on (a.user_ = b.user_ and b.dt >= a.dt))
select * from u pivot (count(user_) for dt in (1901, 1902, 1903, 1904)) order by mth
I assumed that column created_at
is date
datatype. If not, use cast, whatever which works for you. We need this value to be converted in query to yymm
. Also user
is reserved word, I used user_
in my query.
Fill the list in pivot's in
clause with all months (1901...1908) and in the future add next months. Pivot does not allow for dynamic syntax here, you have to specify them.
How it works:
First - distinct values (user, month) from your table. Then is the most important part - self join which creates tuples of starting month and future months for each user. Also here I added analytical count which is required as second column in your report. Final pivot just aggregates such prepared data.