I am extracting some database data using PostgreSQL via Grafana.
The first SQL gives me a list of user_ids (User2, User3...) who have interacted with a specific User1 I know beforehand:
SELECT DISTINCT user_id AS user
FROM transactions
WHERE (type_id, operation_id) IN (
SELECT type_id, operation_id
FROM transactions
WHERE user_id = 11111 --This is User1
)
AND user_id != 11111;
The second SQL gives me some stats for ONE specific user_id (e.g. User2) from another table:
SELECT
COUNT(DISTINCT operation_id) AS "# of operations"
FROM
operations
WHERE
category = "category1" AND
user_id = 22222 --This is User2
How can I "merge" both queries, so that, for each of the users retrieved from the 1st query (User2, User3... so I don't know them beforehand) I can get the stats on the second query (rather than only for a hardcoded User2)?
Thanks!
JOIN
the tables, merge the lists of conditions, then GROUP BY
. You can also FILTER
what goes into the aggregate functions, like the count(*)
you used: demo at db<>fiddle
SELECT user_id
,COUNT(DISTINCT o.operation_id)
FILTER(WHERE category='category1')AS "# of category1 operations"
,COUNT(DISTINCT o.operation_id)
FILTER(WHERE category='category2')AS "# of category2 operations"
FROM transactions AS t
JOIN operations AS o
USING (user_id)
WHERE (t.type_id, t.operation_id) IN (
SELECT type_id, operation_id
FROM transactions
WHERE user_id = 11111 --This is User1
)
AND t.user_id <> 11111
AND category IN ('category1','category2')
GROUP BY user_id;
user_id | # of operations | # of category1 operations | # of category2 operations |
---|---|---|---|
22222 | 3 | 2 | 0 |
33333 | 1 | 1 | 0 |
Make sure those tables are indexed:
create index on transactions(type_id,operation_id);
create index on transactions(user_id,type_id,operation_id);
create index on operations(user_id,category)include(operation_id);
The demo test on 200k randomised rows in each table shows how these indexes help reduce the execution time from 200ms
down to 5ms
.