Search code examples
mysqlsqloracle-databasepostgresqlretention

How to Calculate Month over Month User Retention based on already active users (not based on user signup date)?


I have a Table which tracks User activity (i-e user started a session at what time ?). This table contains the data from Dec 2018 Till now. I need to calculate monthly retention (not based on signup date) based on user Activity( i-e In December 2018, 500 users were active. Then, how many of them were active in Jan,Feb,Mar....Till Now?Same activity should be performed for Jan 2019,Feb 2019... till now users).

I have tried the hard coded way which is to get the users of Dec 2018 in on Table and then Get the Jan 2019 users in other Table and join both tables based on the user_ids, but for that purpose i have to write a lot of joins. Need a dynamic way to check month over month user retention and for all months after Dec 2018(because the data is available from this month onward).

select A.year_month_id,count(distinct A.user_id) as November_Users,count(distinct B.user_id) as December_Retained_Users 
FROM (
        select date_trunc('month', ua.created_at) as monthly,
        ua.user AS user_id
        FROM     user_activity ua
        WHERE    ua.event_type='StartSession'
        and     cast(ua.created_at as date) between cast('20181201' as date) and cast('20181231' as date)
        GROUP BY 1,2
    ) AS A
left Join 
    (
        select date_trunc('month', ua.created_at) as monthly,
        ua.user AS user_id
        FROM     user_activity ua
        WHERE    ua.event_type='StartSession'
        and     cast(ua.created_at as date) between cast('20190101' as date) and cast('20190131' as date)
        GROUP BY 1,2
    ) AS B 
on A.user_id=B.user_id
group by 1

user_activity Table #

id | user | event_type   | created_at
1  | A1   | StartSession | April 29, 2019, 3:59 AM
2  | A2   | StartSession | December 29, 2018, 1:07 AM
3  | A3   | StartSession | December 9, 2018, 4:59 PM
49 | A31  | StartSession | May 25, 2019, 11:59 AM
100| A46  | StartSession | April 29, 2019, 3:56 AM

Expected Output #

Month |Monthly_Active_Users| Jan_Retained|Feb_Retained|Mar_Retained|.......
Dec   | 500                |  300        |  200       | 330
Jan   | 700                |  N/A        |  450       | 410
Feb   | 1000               |  N/A        |  N/A       | 820
Mar   | 920                |  N/A        |  N/A       | N/A
.
.
.
.
Aug   | 100                | N/A         |    N/A     | N/A

Solution

  • I think this will do the job:

    with t as (
        select distinct user_, to_char(created_at, 'yymm') dt
          from user_activity where event_type = 'StartSession'),
      u as (
        select a.user_, a.dt mth, b.dt dt, count(distinct a.user_) over (partition by a.dt) cnt
          from t a join t b on (a.user_ = b.user_ and b.dt >= a.dt))
    select * from u pivot (count(user_) for dt in (1901, 1902, 1903, 1904)) order by mth
    

    dbfiddle demo

    I assumed that column created_at is date datatype. If not, use cast, whatever which works for you. We need this value to be converted in query to yymm. Also user is reserved word, I used user_ in my query.

    Fill the list in pivot's in clause with all months (1901...1908) and in the future add next months. Pivot does not allow for dynamic syntax here, you have to specify them.

    How it works:

    First - distinct values (user, month) from your table. Then is the most important part - self join which creates tuples of starting month and future months for each user. Also here I added analytical count which is required as second column in your report. Final pivot just aggregates such prepared data.