Search code examples
sqlvertica

SQL counting distinct users over a growing timeframe


I don't think I properly titled this, but in essence I'm wanting to be able to count distinct users but have those previous distinct users be considered as time goes on. As an example, say we have a dataset of user purchases over time:

Date    | User 
-----------------
2/3/22  |   A  
2/4/22  |   B
2/22/22 |   C
3/2/22  |   A
3/4/22  |   D
3/15/22 |   A
4/30/22 |   B

Generally, if I were to count distincts grouped by months as would be normal we would get:

Date    | Count 
-----------------
2/1/22  |   3  
3/1/22  |   2
4/1/22  |   1

But what I'm really wanting to see would be how the total number of distinct users increases over the time period.

Date    | Count 
-----------------
2/1/22  |   3  
3/1/22  |   4
4/1/22  |   4

As such it would be 3 distinct users for the first month. Then 4 for the second month considering the total number of distinct users grew by one with the addition of "D" while "A" isn't counted because it was already recognized as a distinct user in the previous month. The third month would then still be 4 because no new distinct user performed an action that month.

Any help would be greatly appreciated (even if it is just a better title so that it reaches more people more appropriately haha)


Solution

  • here's a solution based on running sum in Postgres that should translate well to Vertica.

    select  date_trunc('month', "Date") as "Date"
           ,sum(count(case rn when 1 then 1 end)) over (order by date_trunc('month', "Date")) as "Count"
    from    (
            select  "Date"
                   ,"User"
                   ,row_number()  over(partition by "User" order by "Date") as rn
            from    t
            ) t
    group by  date_trunc('month', "Date")
    order by "Date"
    
    Date Count
    2022-02-01 00:00:00 3
    2022-03-01 00:00:00 4
    2022-04-01 00:00:00 4

    Fiddle