I have situation where I store all versions of a given entity in my PostgreSQL database. This is implemented with two tables; one table storing the primary key and immutable properties of the entity and a second table storing the mutable properties of the entity. Both tables are insert-only (enforced by a trigger).
The concept can easily be illustrated with an entity User
, stored in the user
and user_details
tables:
Table user
:
id timestamp
1 2018-04-10T12:00:00
2 2018-04-10T12:00:00
Table user_details
:
id user_id username first_name last_name timestamp
1 1 bob Bob Socks 2018-04-10T12:00:01
2 1 bob Bobby Socks 2018-04-10T12:00:02
3 2 alice Alice Jones 2018-04-10T12:00:03
4 1 bob Bobbers Socks 2018-04-10T12:00:04
5 2 alice Alicia Jones 2018-04-10T12:00:05
Both 'id' columns are defined as serial primary keys (strictly incrementing) and I have created an index on user_details (user_id, id DESC)
.
Given a user id I need a quick way to fetch the immutable data in user
and the most recent entry from user_details
. What kind of query would be best suited for this join?
I am generating audit logs for time intervals by first fetching all rows with timestamp
between X and Y and then I fetch the inserted row and its predecessor (same user_id
, closest lower id
) and produce a diff from these. The number of rows inserted between X and Y is often high, so I need to efficiently fetch the current + previous pairs, i.e. given input user_details(5)
, I need to select the join of user(2) + user_details(5)
and user(2) + user_details(3)
. What kind of query would be best suited for this join?
My best results so far has been with these queries:
Query for question 1:
SELECT *
FROM "user" u
JOIN LATERAL (SELECT *
FROM "user_details" ud
WHERE u.id = ud.user_id
ORDER BY id DESC
LIMIT 1
) detail ON TRUE
WHERE u.id IN
(...);
Query for question 2:
SELECT *
FROM "user" u
JOIN LATERAL (SELECT *
FROM "user_details" ud
WHERE u.id = ud.user_id
AND ud.id IN (...)
ORDER BY id DESC
LIMIT 2) ud ON TRUE;
However, both queries end up using nested loops (seen from EXPLAIN ANALYZE
) and take a long time to finish when run with a large number of ids (5000+).
Can I use the user_details (user_id, id DESC)
index in a smart way to first create a CTE of the user_details
ids I need and then join user + user_details
based on this? Can I create a functional index of some sort? Do I need to maintain a predecessor
column in user_details
(or another table) do be able to look up relations of this type efficiently?
Thanks!
SQL Fiddle: http://www.sqlfiddle.com/#!17/5f5f0
Thanks to X and Y for pushing me in the right direction! I ended up using the solution @MichelMilezzi suggested for my first problem and an adaption of the @RadimBača solution for my second problem:
WITH
cte_1 AS (SELECT id, user_id FROM "user_details" WHERE id IN (8999, 9999)),
cte_2 as (SELECT cte_1.id, cte_1.user_id, prev.id AS prev_id, row_number() OVER (PARTITION BY cte_1.id, cte_1.user_id ORDER BY prev.id DESC) AS rownum FROM "user_details" prev, cte_1 WHERE prev.user_id = cte_1.user_id AND prev.id < cte_1.id)
SELECT main.*, detail.*, cte_2.id AS __id, (detail.id <> cte_2.id) AS __is_predecessor FROM "user" main, "user_details" detail, cte_2
WHERE main.id = cte_2.user_id AND cte_2.rownum = 1 AND (detail.id = cte_2.id OR detail.id = cte_2.prev_id);
Consider using window functions
SELECT *
FROM "user" u
JOIN
(
SELECT row_number() over(partition by user_id order by id) rn,
*
FROM "user_details" ud
) t ON t.user_id = u.id
WHERE t.rn = 1
This solution allows you to query also all N rows per group or N-th row per group.