Search code examples
mysqljoinretention

MySQL query calculate user retention in a certain date-range


I'm trying to select the count of users retaining in a certain date range and that was successful using this query and the table below:

+----------+-------------+
| Field    | Type        |
+----------+-------------+
| id       | varchar(17) |
| log_date | date        |
+----------+-------------+

SELECT last_day.log_date, COUNT(distinct last_day.id) as users_num
FROM (SELECT DISTINCT log_date, id
      FROM `userActivity`) this_day
JOIN (SELECT DISTINCT log_date, id
      FROM `userActivity`) last_day
ON this_day.id = last_day.id
AND this_day.log_date = "2018-10-01"
AND last_day.log_date BETWEEN "2018-10-01" AND "2018-10-30"
GROUP BY log_date;

But the problem that I'm facing is that I want the assume that every day of the date-range is day 0. (similar to the following example):

Note that the first row is the avg of the below results I need to calculate

Note that the first row in the pic is the avg of the below results I need to calculate. Anyone has any idea how can I enhance my query to get the result like the picture?


Solution

  • This solution will only work on MySQL 8.x only since it requires CTEs (Common Table Expressions):

    with digits as (
      select 0 as n union select 1 union select 2 union select 3 union select 4
      union select 5 union select 6 union select 7 union select 8 union select 9
    ),
    series as (
    select d1.n * 100 + d2.n * 10 + d3.n as n -- number series from 0 to 999
    from digits d1
    cross join digits d2
    cross join digits d3
    )
    SELECT last_day.log_date, COUNT(distinct last_day.id) as users_num,
           date_add("2018-10-01", interval s.n day) as current_start
    FROM (SELECT DISTINCT log_date, id
          FROM `userActivity`) this_day
    JOIN (SELECT DISTINCT log_date, id
          FROM `userActivity`) last_day
    ON this_day.id = last_day.id
    cross join series s
    WHERE s.n <= 30
      AND this_day.log_date = date_add("2018-10-01", interval s.n day)
      AND last_day.log_date BETWEEN date_add("2018-10-01", interval s.n day) 
                                AND date_add("2018-10-30", interval s.n day)
    GROUP BY log_date, date_add("2018-10-01", interval s.n day);