Search code examples
sqlsubquerywindow-functionsamazon-athenarunning-count

SQL query to obtain users seen from a particular variabe date


I have a table "days_users" with dates and users (one row per date and user), from where I could extract the information of which particular days a user was seen, or which users were seen on a particular day:

2023-01-01,user1
2023-01-01,user2
2023-01-01,user3
2023-01-02,user2
2023-01-02,user4
2023-01-03,user1
2023-01-03,user4

I need to compute how many new users appear/disappear each day, and I don't know how to do it. A user "appear" on one day means that the user is seen on that day, but was never seen before, and a user "disappear" on one day means that the user was seen on that day but was never seen after that date.

I thought that a way to start is to create a view of users, first_date_seen, last_date_seen as follows:

user_first_last AS (
SELECT user, min(date) AS first_date_seen, max(date) AS last_date_seen FROM days_users
GROUP BY 1
)

And then count all the users that appear from a particular date until the end

SELECT date, COUNT(DISTINCT user) as num_appearing_users 
FROM user_first_last WHERE first_date_seen = {date} AND last_date_seen = '2023-03-01' 
GROUP BY 1
ORDER BY 1

And similarly for the disappearing users

SELECT date, COUNT(DISTINCT user) as num_disappearing_users 
FROM user_first_last WHERE first_date_seen = '2023-01-01' AND last_date_seen = {date} 
GROUP BY 1
ORDER BY 1

But please note the {date} between curly braces: I would like this date to be the same as the date in the query, i. e. both dates in bold should be the same:

SELECT 
    **date**, COUNT(DISTINCT user) AS num_disappearing_users 
FROM user_first_last 
WHERE first_date_seen = '2023-01-01' 
  AND last_date_seen = **date** 

How can I achieve this?


Solution

  • You can do this:

    with dates as(
    select distinct date from days_users),
    usg as (
      select min(date) first_date,
        max(date) last_date,
        user
        from days_users
        GROUP BY user
    )
    select date,
      (select count(user)
       from usg
       where usg.last_date=dates.date
      ) never_seen_after,
      (select count(user)
       from usg
       where usg.first_date=dates.date
      ) never_seen_before
    from dates 
    

    Here we use CTEs:

    • dates - to get list of dates present in data,
    • usg - list of users with their first and last day.