I have a query that collects information from different tables for a particular time range.
Currently, I make requests for every user and every date range separately, but I would like to run it for all time ranges at once, where the time range is every seven days between user_opened_account_at and user_closed_account_at, which is different for every user.
Is there any proper way to do this in one query?
Example: https://www.db-fiddle.com/f/aDFuX4qjzCcUmXe8iipaBM/2
Query:
SELECT
usr.id as user_id,
usr."onboardedAt" as user_opened_account_at,
usr."closedAt" as user_closed_account_at,
'2021-01-01' as start_range_date,
'2021-01-08' as end_range_date,
tx.tx_count as tx_count,
last_user_action.action as last_user_action
FROM "Users" usr
LEFT JOIN (
SELECT
"userId",
COUNT("id") as "tx_count"
FROM "Transactions"
WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
GROUP BY "userId"
) tx ON usr.id = tx."userId"
LEFT JOIN (
SELECT "userId", "action"
FROM "UserActions"
WHERE "createdAt" >= '2021-01-01' AND "createdAt" < '2021-01-08'
ORDER BY "createdAt" DESC
LIMIT 1
) last_user_action ON usr.id = last_user_action."userId"
WHERE usr.id = 1
ORDER BY user_id, start_range_date
Schema:
CREATE TABLE "Users" (
id bigserial PRIMARY KEY,
"onboardedAt" timestamp with time zone,
"closedAt" timestamp with time zone
);
CREATE TABLE "Transactions" (
id bigserial PRIMARY KEY,
"userId" bigint,
"createdAt" timestamp with time zone,
amount numeric(20,8) NOT NULL DEFAULT 0
);
CREATE TABLE "UserActions" (
id bigserial PRIMARY KEY,
"userId" bigint,
"createdAt" timestamp with time zone,
action character varying(255) NOT NULL
);
INSERT INTO "Users" ("onboardedAt", "closedAt") VALUES
( '2021-01-01', '2021-02-01' ),
( '2021-01-01', '2021-02-01' ),
( '2021-01-01', '2021-02-01' ),
( '2021-02-01', '2021-03-01' ),
( '2021-02-01', '2021-03-01' );
INSERT INTO "Transactions" ("userId", "createdAt", "amount") VALUES
( 1, '2021-01-02', 100 ),
( 1, '2021-01-08', -100 ),
( 1, '2021-01-15', -200 ),
( 1, '2021-01-22', 200 ),
( 2, '2021-01-02', -100 ),
( 2, '2021-01-02', 100 ),
( 2, '2021-01-15', -200 ),
( 2, '2021-01-16', 200 ),
( 3, '2021-01-02', 100 ),
( 3, '2021-01-08', -100 ),
( 3, '2021-01-15', -200 ),
( 3, '2021-01-22', 200 ),
( 4, '2021-02-02', 50 ),
( 4, '2021-02-08', -100 ),
( 4, '2021-02-15', -200 ),
( 4, '2021-02-22', 200 ),
( 5, '2021-02-02', 200 ),
( 5, '2021-02-08', -400 ),
( 5, '2021-02-15', -600 ),
( 5, '2021-02-22', 200 );
INSERT INTO "UserActions" ("userId", "createdAt", "action") VALUES
( 1, '2021-01-01', 'PLAY' ),
( 1, '2021-01-01', 'PLAY' ),
( 1, '2021-01-02', 'DEPOSIT' ),
( 1, '2021-01-08', 'DEPOSIT' ),
( 1, '2021-01-09', 'PLAY' ),
( 1, '2021-01-15', 'PLAY' ),
( 1, '2021-01-22', 'PLAY' ),
( 2, '2021-01-01', 'PLAY' ),
( 2, '2021-01-01', 'PLAY' ),
( 2, '2021-01-02', 'DEPOSIT' ),
( 2, '2021-01-08', 'DEPOSIT' ),
( 2, '2021-01-09', 'PLAY' ),
( 2, '2021-01-15', 'PLAY' ),
( 2, '2021-01-22', 'PLAY' ),
( 3, '2021-01-01', 'PLAY' ),
( 3, '2021-01-01', 'PLAY' ),
( 3, '2021-01-02', 'DEPOSIT' ),
( 3, '2021-01-08', 'DEPOSIT' ),
( 3, '2021-01-09', 'PLAY' ),
( 3, '2021-01-15', 'PLAY' ),
( 3, '2021-01-22', 'PLAY' ),
( 4, '2021-02-01', 'DEPOSIT' ),
( 4, '2021-02-01', 'PLAY' ),
( 4, '2021-02-02', 'DEPOSIT' ),
( 4, '2021-02-08', 'DEPOSIT' ),
( 4, '2021-02-09', 'PLAY' ),
( 4, '2021-02-15', 'PLAY' ),
( 4, '2021-02-22', 'PLAY' ),
( 5, '2021-02-01', 'DEPOSIT' ),
( 5, '2021-02-01', 'PLAY' ),
( 5, '2021-02-02', 'PLAY' ),
( 5, '2021-02-08', 'PLAY' ),
( 5, '2021-02-09', 'PLAY' ),
( 5, '2021-02-15', 'DEPOSIT' ),
( 5, '2021-02-22', 'PLAY' );
With all weeks starting on Monday, this would do it (efficiently):
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
ORDER BY id, week_start;
db<>fiddle here
Working with standard weeks makes everything much simpler. We can aggregate in the "many" tables before joining, which is simpler and cheaper. Else, multiple joins can go wrong quickly. See:
Standard weeks make it easier to compare data, too. (Note that first and last week per user can be truncated (span fewer days). But that applies to the last week per user in any case.)
The LATERAL
keyword is assumed automatically in a join to a set-returning function:
CROSS JOIN generate_series(...)
See:
Using DISTINCT ON
to get the last_user_action
per user. See:
I advise to user legal, lower-case identifiers, so double-quoting is not required. Makes your life with Postgres easier.
Added in a comment:
if action is null in a current week, I want to grab most recent from previous weeks
SELECT user_id, "onboardedAt", "closedAt", week_start, tx_count
, last_user_action AS last_user_action_with_null
, COALESCE(last_user_action, max(last_user_action) OVER (PARTITION BY user_id, null_grp)) AS last_user_action
FROM (
SELECT id AS user_id, u."onboardedAt", u."closedAt"
, week_start, COALESCE(t.tx_count, 0) AS tx_count, a.last_user_action
, count(a.last_user_action) OVER (PARTITION BY id ORDER BY week_start) AS null_grp
FROM "Users" u
CROSS JOIN generate_series(date_trunc('week', u."onboardedAt"), u."closedAt", interval '1 week') AS week_start
LEFT JOIN (
SELECT "userId" AS id, date_trunc('week', t."createdAt") AS week_start, count(*) AS tx_count
FROM "Transactions" t
GROUP BY 1, 2
) t USING (id, week_start)
LEFT JOIN (
SELECT DISTINCT ON (1, 2)
"userId" AS id, date_trunc('week', a."createdAt") AS week_start, action AS last_user_action
FROM "UserActions" a
ORDER BY 1, 2, "createdAt" DESC
) a USING (id, week_start)
) sub
ORDER BY user_id, week_start;
db<>fiddle here
Explanation: