Few columns in my table looks like
Id Code date latest
1 T 2014-10-04 0
2 B 2014-10-19 0
2 B 2014-10-26 0
1 S 2014-10-05 0
1 T 2014-10-06 0
1 T 2014-10-08 1
2 P 2014-10-27 1
I am tracking all changes made by each ID. if there is any change, I insert new row and update the latest value column.
What I want is for each Id, I should be able to find last code where latest is 0. Also, that code should not be equal to existing code(latest = 1) So for id = 1, answer cannot be
Id Code
1 T
as for id = 1
T is existing code (latest = 1
).
So ideally my output should look like:
Id Code
1 S
2 B
I think I can get the latest value for code for each id where latest = 0
.
But how do I make sure that it should not be equal to existing code value (latest = 1
)
Works in Postgres:
SELECT DISTINCT ON (t0.id)
t0.id, t0.code
FROM tbl t0
LEFT JOIN tbl t1 ON t1.code = t0.code
AND t1.id = t0.id
AND t1.latest = 1
WHERE t0.latest = 0
AND t1.code IS NULL
ORDER BY t0.id, t0.date DESC;
I use the combination of a LEFT JOIN
/ IS NULL
to remove siblings of rows with latest = 1
. There are various ways to do this:
Details for DISTINCT ON
:
Since Redshift does not seem to support DISTINCT ON
:
WITH cte AS (
SELECT t0.*
FROM tbl t0
LEFT JOIN tbl t1 ON t1.code = t0.code
AND t1.id = t0.id
AND t1.latest = 1
WHERE t0.latest = 0
AND t1.id IS NULL
)
SELECT c0.id, c0.code
FROM cte c0
LEFT JOIN cte c1 ON c1.id = c0.id
AND c1.date > c0.date
WHERE c1.id IS NULL
ORDER BY c0.id;
SQL Fiddle showing both.