Search code examples
postgresql

Aggregating by week with a partial starting week in Postgres


I have a query that groups data by week. I would like to include a partial week at the start and end of the result but the partial week does not count the records.

SELECT
    series_interval,
    coalesce(conversion_count, 0) AS conversion_count
FROM (
      SELECT
        series::date AS series_interval
      FROM
        generate_series(
          date_trunc('week', to_date('2024-12-10', 'YYYY-MM-DD')), -- Added 6 days to start date to get following week
          date_trunc('week', to_date('2024-12-23', 'YYYY-MM-DD')),
          INTERVAL '1 week'
        ) AS series
      UNION
      SELECT to_date('2024-12-04', 'YYYY-MM-DD')
    ) AS series -- Start date (middle of week)
    LEFT JOIN (
    SELECT
        date_trunc('week', "conversions"."created_at") AS agg_interval,
        count("conversions"."id") AS conversion_count
    FROM
        "conversions"
        INNER JOIN "commissions" ON "commissions"."conversion_id" = "conversions"."id"
    WHERE
        "conversions"."created_at" >= '2024-12-02' 
             AND "conversions"."created_at" <= '2024-12-23 23:59:59.999999'
    GROUP BY
        agg_interval
) c ON c."agg_interval" = series."series_interval";

The date buckets are:

4th => 8th (5 days)
9th => 15th (7 days)
16th => 22nd (7 days)
23rd => 23rd (1 day - restricted by WHERE so final partial week is what I want)

And the query result is:

series_interval conversion_count
2024-12-04 0
2024-12-09 411
2024-12-16 368
2024-12-23 32

I understand why I get 0 for the first row 0 - because 2024-12-04 is not a Monday and this is how the date truncation / grouping works.

How can I get the count for 4th => 8th?


Solution

  • For simplicity, I'l use CTE params to set start_date and end_date for query. Further, the literal constants are not used in the query.
    Source data for example see in fiddle .

    Generate series (report frame) include start_date, start date of weeks between start_date and end_date.

    with params as(
      select '2024-12-04 00:00:00.00000'::timestamp start_date
         ,'2024-12-23 23:59:59.999999'::timestamp end_date
    )
    SELECT start_date report_date ,date_trunc('week',start_date) series_interval
    from params p
    
    UNION
    SELECT series::date AS report_date,series::date AS series_interval
    FROM params p
    cross join generate_series(
              date_trunc('week',start_date+interval '6 day'), -- Added 6 days to start date to get following week
              date_trunc('week', end_date),
              INTERVAL '1 week'
            ) AS series
    
    report_date series_interval
    2024-12-04 00:00:00 2024-12-02 00:00:00
    2024-12-16 00:00:00 2024-12-16 00:00:00
    2024-12-09 00:00:00 2024-12-09 00:00:00
    2024-12-23 00:00:00 2024-12-23 00:00:00

    Column report_date is used as output, series_interval - for join with grouped data.
    Extra dates are cut off by the filter WHERE created_at between start_date and end_date.

    with params as(
      select '2024-12-04 00:00:00.00000'::timestamp start_date
         ,'2024-12-23 23:59:59.999999'::timestamp end_date
    )
    SELECT report_date::date, series_interval,
        coalesce(conversion_count, 0) AS conversion_count
    FROM (
        SELECT start_date report_date ,date_trunc('week',start_date) series_interval
        from params p
        UNION
        SELECT series::date AS report_date,series::date AS series_interval
        FROM params p
        cross join generate_series(
              date_trunc('week',start_date+interval '6 day'), -- Added 6 days to start date to get following week
              date_trunc('week', end_date),
              INTERVAL '1 week'
            ) AS series
      ) AS series -- Start date (middle of week)
        LEFT JOIN (
          SELECT
            date_trunc('week', "conversions"."created_at") AS agg_interval,
            count("conversions"."id") AS conversion_count
          FROM "conversions"
          cross join params p
    --        INNER JOIN "commissions" ON "commissions"."conversion_id" = "conversions"."id"
          WHERE "conversions"."created_at" between start_date and end_date
          GROUP BY agg_interval
     ) c ON c."agg_interval" = series."series_interval"
     order by report_date;
    
    report_date series_interval conversion_count
    2024-12-04 2024-12-02 00:00:00 5
    2024-12-09 2024-12-09 00:00:00 7
    2024-12-16 2024-12-16 00:00:00 7
    2024-12-23 2024-12-23 00:00:00 1

    fiddle

    If you want first data bucket would aggregated on the full week (starting on 2nd Dec) - (It's not clear why) use where filter

    WHERE "conversions"."created_at" between date_trunc('week',start_date) and end_date