Search code examples
postgresqlbusiness-intelligencewindow-functions

Using PostgreSQL, how can I count the amount of individuals that opened a message in the previous 30 days from the Monday of each week?


Scenario:
I have a table, events_table, that consists of records that are inserted by a webhook based on messages I send to my users:

"column_name" (type)
- "time_stamp" (timestamp with time zone)
- "username" (varchar)
- "delivered" (int)
- "action" (int)

Sample Data:

|    time_stamp   | username | delivered | action |
|:----------------|:---------|:----------|:-------|
|1349733421.460000|  user1   |     1     |  null  |
|1549345346.460000|  user3   |     1     |   1    |
|1524544421.460000|  user1   |     1     |   1    |
|1345444421.570000|  user7   |     1     |  null  |
|1756756761.980000|  user9   |     1     |  null  |
|1234343421.460000|  user171 |     1     |   1    |
|1843455621.460000|  user5   |     1     |   1    |
|      ...        |   ...    |   ...     |  ...   |

The "delivered" column is null by default and 1 when delivered. The "action" column is null by default and is 1 when opened.

Problem:
Using PostgreSQL, how can I count the amount of individuals that opened an email in the previous 30 days from the Monday of each week?

Ideal query results:

|      date       |   count   |
|:----------------|:----------|
|   02/24/2020    | 1,234,123 |
|   02/17/2020    |  234,123  |
|   02/10/2020    | 1,234,123 |
|   02/03/2020    |12,341,213 |
|      ...        |    ...    |

My attempt: This is the extent of what I've tried which gives me count of the previous week:

SELECT
  date_trunc('week', to_timestamp("time_stamp")) as date,
  count("username") as count,
  lag(count(1), 1) over (order by "date") as "count_previous_week"
FROM events_table
WHERE "delivered" = 1
     and "action" = 1
GROUP BY 1 order by 1 desc     

Solution

  • This is my attempt at writing this query.

    First I get the lowest and highest dates from the data set. I add 7 days on to the highest date to make sure I include data up to today.

    I then run generate_series against these 2 values set with an interval of 7 days to give me every single monday between the 2 points (we can't rely on just mondays within your data set in case we have an empty week)

    Then, I simply subquery and aggregate the data based on our generate_series output.

    select
        __weeks.week_begins,
        (
            select
                count(distinct "username")
            from
                events_table
            where
                to_timestamp("time_stamp")::date between week_begins - '30 days'::interval and week_begins
                and "delivered" = 1
                and "action" = 1
        )
    from
        (
            select
                generate_series(_.min_date, _.max_date, '7 days'::interval)::date as week_begins
            from
                (
                    select
                        min(date_trunc('week', to_timestamp("time_stamp"))::date) as min_date
                        max(date_trunc('week', to_timestamp("time_stamp"))::date) as max_date
                    from
                        events_table
                    where
                        "delivered" = 1
                        and "action" = 1
                ) as _
        ) as __weeks
    order by
        __weeks.week_begins
    

    I'm not particularly keen on this query because the query planner visits the same table twice, but I can't think of another way to structure it.