Search code examples
sqlpostgresqlaggregategreatest-n-per-groupgenerate-series

Sql group query results by user id and date ranges dynamically


I have a query that collects information from different tables for a particular time range.

Currently, I make requests for every user and every date range separately, but I would like to run it for all time ranges at once, where the time range is every seven days between user_opened_account_at and user_closed_account_at, which is different for every user.

Is there any proper way to do this in one query?

Example: https://www.db-fiddle.com/f/aDFuX4qjzCcUmXe8iipaBM/2

The result I have: result I have

The result I want to see: Result I want to see

Query:

SELECT 
    usr.id as user_id,
    usr."onboardedAt" as user_opened_account_at,
    usr."closedAt" as user_closed_account_at,
    '2021-01-01' as start_range_date,
    '2021-01-08' as end_range_date,
    tx.tx_count as tx_count,
    last_user_action.action as last_user_action
FROM "Users" usr

LEFT JOIN (
    SELECT 
        "userId",
        COUNT("id") as "tx_count"
    FROM "Transactions"
    WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
    GROUP BY "userId"
) tx ON usr.id = tx."userId"

LEFT JOIN (
    SELECT "userId", "action"
    FROM "UserActions"
    WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
    ORDER BY "createdAt" DESC 
    LIMIT 1
) last_user_action ON usr.id = last_user_action."userId"

WHERE usr.id = 1
ORDER BY user_id, start_range_date

Schema:

CREATE TABLE "Users" (
    id bigserial PRIMARY KEY,
    "onboardedAt" timestamp with time zone,
    "closedAt" timestamp with time zone
);

CREATE TABLE "Transactions" (
    id bigserial PRIMARY KEY,
    "userId" bigint,
    "createdAt" timestamp with time zone,
    amount numeric(20,8) NOT NULL DEFAULT 0
);

CREATE TABLE "UserActions" (
    id bigserial PRIMARY KEY,
    "userId" bigint,
    "createdAt" timestamp with time zone,
    action character varying(255) NOT NULL
);


INSERT INTO "Users" ("onboardedAt", "closedAt") VALUES 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-01-01', '2021-02-01' ), 
    ( '2021-02-01', '2021-03-01' ), 
    ( '2021-02-01', '2021-03-01' );

INSERT INTO "Transactions" ("userId", "createdAt", "amount") VALUES 
    ( 1, '2021-01-02',  100 ), 
    ( 1, '2021-01-08', -100 ), 
    ( 1, '2021-01-15', -200 ),
    ( 1, '2021-01-22',  200 ),

    ( 2, '2021-01-02', -100 ), 
    ( 2, '2021-01-02',  100 ), 
    ( 2, '2021-01-15', -200 ),
    ( 2, '2021-01-16',  200 ), 

    ( 3, '2021-01-02',  100 ), 
    ( 3, '2021-01-08', -100 ), 
    ( 3, '2021-01-15', -200 ),
    ( 3, '2021-01-22',  200 ),

    ( 4, '2021-02-02',   50 ), 
    ( 4, '2021-02-08', -100 ), 
    ( 4, '2021-02-15', -200 ),
    ( 4, '2021-02-22',  200 ),

    ( 5, '2021-02-02',  200 ), 
    ( 5, '2021-02-08', -400 ), 
    ( 5, '2021-02-15', -600 ),
    ( 5, '2021-02-22',  200 );

INSERT INTO "UserActions" ("userId", "createdAt", "action") VALUES 
    ( 1, '2021-01-01', 'PLAY' ), 
    ( 1, '2021-01-01', 'PLAY' ), 
    ( 1, '2021-01-02', 'DEPOSIT' ), 
    ( 1, '2021-01-08', 'DEPOSIT' ), 
    ( 1, '2021-01-09', 'PLAY' ), 
    ( 1, '2021-01-15', 'PLAY' ), 
    ( 1, '2021-01-22', 'PLAY' ), 

    ( 2, '2021-01-01', 'PLAY' ), 
    ( 2, '2021-01-01', 'PLAY' ), 
    ( 2, '2021-01-02', 'DEPOSIT' ), 
    ( 2, '2021-01-08', 'DEPOSIT' ), 
    ( 2, '2021-01-09', 'PLAY' ), 
    ( 2, '2021-01-15', 'PLAY' ), 
    ( 2, '2021-01-22', 'PLAY' ), 

    ( 3, '2021-01-01', 'PLAY' ), 
    ( 3, '2021-01-01', 'PLAY' ), 
    ( 3, '2021-01-02', 'DEPOSIT' ), 
    ( 3, '2021-01-08', 'DEPOSIT' ), 
    ( 3, '2021-01-09', 'PLAY' ), 
    ( 3, '2021-01-15', 'PLAY' ), 
    ( 3, '2021-01-22', 'PLAY' ), 

    ( 4, '2021-02-01', 'DEPOSIT' ), 
    ( 4, '2021-02-01', 'PLAY' ), 
    ( 4, '2021-02-02', 'DEPOSIT' ), 
    ( 4, '2021-02-08', 'DEPOSIT' ), 
    ( 4, '2021-02-09', 'PLAY' ), 
    ( 4, '2021-02-15', 'PLAY' ), 
    ( 4, '2021-02-22', 'PLAY' ), 

    ( 5, '2021-02-01', 'DEPOSIT' ), 
    ( 5, '2021-02-01', 'PLAY' ), 
    ( 5, '2021-02-02', 'PLAY' ), 
    ( 5, '2021-02-08', 'PLAY' ), 
    ( 5, '2021-02-09', 'PLAY' ), 
    ( 5, '2021-02-15', 'DEPOSIT' ), 
    ( 5, '2021-02-22', 'PLAY' );

Solution

  • With all weeks starting on Monday, this would do it (efficiently):

    SELECT id AS user_id, u."onboardedAt", u."closedAt"
         , week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
    FROM   "Users" u
    CROSS  JOIN  generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
    LEFT   JOIN (
       SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
       FROM   "Transactions" t
       GROUP  BY 1, 2
       ) t USING (id, week_start)
    LEFT   JOIN (
       SELECT DISTINCT ON (1, 2)
              "userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
       FROM   "UserActions" a
       ORDER  BY 1, 2, "createdAt" DESC
       ) a USING (id, week_start)
    ORDER  BY id, week_start;
    

    db<>fiddle here

    Working with standard weeks makes everything much simpler. We can aggregate in the "many" tables before joining, which is simpler and cheaper. Else, multiple joins can go wrong quickly. See:

    Standard weeks make it easier to compare data, too. (Note that first and last week per user can be truncated (span fewer days). But that applies to the last week per user in any case.)

    The LATERAL keyword is assumed automatically in a join to a set-returning function:

    CROSS  JOIN  generate_series(...)
    

    See:

    Using DISTINCT ON to get the last_user_action per user. See:

    I advise to user legal, lower-case identifiers, so double-quoting is not required. Makes your life with Postgres easier.

    Use last non-null action

    Added in a comment:

    if action is null in a current week, I want to grab most recent from previous weeks

    SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
         , last_user_action AS last_user_action_with_null
         , COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
    FROM  (
       SELECT id AS user_id, u."onboardedAt", u."closedAt"
            , week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
            , count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
       FROM   "Users" u
       CROSS  JOIN  generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
       LEFT   JOIN (
          SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
          FROM   "Transactions" t
          GROUP  BY 1, 2
          ) t  USING (id, week_start)
       LEFT   JOIN (
          SELECT DISTINCT ON (1, 2)
                 "userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
          FROM   "UserActions" a
          ORDER  BY 1, 2, "createdAt" DESC
          ) a USING (id, week_start)
       ) sub
    ORDER  BY user_id, week_start;
    

    db<>fiddle here

    Explanation: