I have the following 2 SELECTs:
SELECT * FROM public.app_user WHERE id = 'e31b55bf';
+--------+----+-----------+-----+-----+--------------------------+
|id |name|email |role |bio |created_at |
+--------+----+-----------+-----+-----+--------------------------+
|e31b55bf|Jon |jon@app.com|admin|Hello|2022-01-01 00:00:00.000000|
+--------+----+-----------+-----+-----+--------------------------+
SELECT * FROM history.app_user WHERE id = 'e31b55bf';
+--------+----+--------------+--------+----+--------------------------+
|id |name|email |role |bio |updated_at |
+--------+----+--------------+--------+----+--------------------------+
|e31b55bf|ASDF|test |NULL |NULL|2022-01-02 00:00:00.000000|
|e31b55bf|Test|test@gmail.com|basic |NULL|2022-01-03 00:00:00.000000|
|e31b55bf|NULL|NULL |standard|asdf|2022-01-04 00:00:00.000000|
|e31b55bf|NULL|NULL |mod |NULL|2022-01-05 00:00:00.000000|
+--------+----+--------------+--------+----+--------------------------+
public.app_user
contains the users of my app and history.app_user
contains a record of previous values of the rows of the first. In the example above, the user e31b55bf
was a mod instead of an admin before the 5th of January, an standard user with the bio "asdf" before the 4th, a basic user named "Test" with the email "test@gmail.com" before the 3rd...
I want to come up with a SELECT or a FUNCTION that will tell me what the row looked like at a specific point in time. I believe I have already accomplished it, but my solution looks more complex than it should be. It's also tedious to translate it to other tables: say public.project
and history.project
, for example, where the columns are totally different. I believe a cleaner, easier to read and write solution exists. Can a SQL wizard help me here?
My current solution consists of overriding the values of the current row at public.app_user
with the non-null values of this:
SELECT t.uuid,
t1.name,
t2.email,
t3.role,
t4.bio
FROM (
SELECT uuid,
MIN(CASE WHEN name IS NOT NULL THEN updated_at END) AS name_date,
MIN(CASE WHEN email IS NOT NULL THEN updated_at END) AS email_date,
MIN(CASE WHEN role IS NOT NULL THEN updated_at END) AS role_date,
MIN(CASE WHEN bio IS NOT NULL THEN updated_at END) AS bio_date
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
GROUP BY uuid
) t
LEFT JOIN history.app_user t1 ON t1.updated_at = t.name_date
LEFT JOIN history.app_user t2 ON t2.updated_at = t.email_date
LEFT JOIN history.app_user t3 ON t3.updated_at = t.role_date
LEFT JOIN history.app_user t4 ON t4.updated_at = t.bio_date
Solution 1 : window function
SELECT DISTINCT ON (uuid)
, uuid
, (array_agg(name) FILTER (WHERE name IS NOT NULL) OVER w)[1] AS name
, (array_agg(email) FILTER (WHERE email IS NOT NULL) OVER w)[1] AS email
, (array_agg(role) FILTER (WHERE role IS NOT NULL) OVER w)[1] AS role
, (array_agg(bio) FILTER (WHERE bio IS NOT NULL) OVER w)[1] AS bio
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
WINDOW w AS (PARTITION BY uuid ORDER BY updated_at)
array_agg()
is an aggregate function which is used as a window function here.
FILTER (WHERE condition)
is used to exclude the NULL values from the selected rows.
The window is a subset of rows associated to the current row as described in the manual, ie all the existing rows with the same uuid
as stated in the PARTITION BY
clause. The ORDER BY
clause allows to put the earliest non-null value in the first position of the resulting array and which is selected by [1]
.
The main issue of using here window functions is that we get as many rows as the ones filtered by the WHERE
clause. The DISTINCT ON ()
clause excludes the redundant rows from the final result.
The second solution based on the same function used as an aggregate function
instead of a window function
is more accurate in our case.
Solution 2 : aggregate function
SELECT uuid
, (array_agg(name ORDER BY updated_at) FILTER (WHERE name IS NOT NULL))[1] AS name
, (array_agg(email ORDER BY updated_at) FILTER (WHERE email IS NOT NULL))[1] AS email
, (array_agg(role ORDER BY updated_at) FILTER (WHERE role IS NOT NULL))[1] AS role
, (array_agg(bio ORDER BY updated_at) FILTER (WHERE bio IS NOT NULL))[1] AS bio
FROM history.app_user
WHERE updated_at > '2022-01-03 12:00:00.000000' -- Date to check
GROUP BY uuid