sql postgresql greatest-n-per-group amazon-redshift

Comparing different rows in PostgreSQL for each Id

Few columns in my table looks like

Id   Code   date        latest
1    T     2014-10-04    0
2    B     2014-10-19    0
2    B     2014-10-26    0
1    S     2014-10-05    0
1    T     2014-10-06    0
1    T     2014-10-08    1
2    P     2014-10-27    1

I am tracking all changes made by each ID. if there is any change, I insert new row and update the latest value column.

What I want is for each Id, I should be able to find last code where latest is 0. Also, that code should not be equal to existing code(latest = 1) So for id = 1, answer cannot be

 Id   Code
 1    T

as for id = 1 T is existing code (latest = 1).
So ideally my output should look like:

Id    Code
 1     S
 2     B

I think I can get the latest value for code for each id where latest = 0.

But how do I make sure that it should not be equal to existing code value (latest = 1)

Solution

Works in Postgres:

SELECT DISTINCT ON (t0.id)
       t0.id, t0.code
FROM   tbl t0
LEFT   JOIN tbl t1 ON t1.code = t0.code
                  AND t1.id = t0.id
                  AND t1.latest = 1
WHERE  t0.latest = 0
AND    t1.code IS NULL
ORDER  BY t0.id, t0.date DESC;

I use the combination of a LEFT JOIN / IS NULL to remove siblings of rows with latest = 1. There are various ways to do this:

Select rows which are not present in other table

Details for DISTINCT ON:

Select first row in each GROUP BY group?

Version with CTE and 2x LEFT JOIN

Since Redshift does not seem to support DISTINCT ON:

WITH cte AS (
   SELECT t0.*
   FROM   tbl t0
   LEFT   JOIN tbl t1 ON t1.code = t0.code
                     AND t1.id = t0.id
                     AND t1.latest = 1
   WHERE  t0.latest = 0
   AND    t1.id IS NULL
   )
SELECT c0.id, c0.code
FROM   cte c0
LEFT   JOIN cte c1 ON c1.id = c0.id
                  AND c1.date > c0.date
WHERE  c1.id IS NULL
ORDER  BY c0.id;

SQL Fiddle showing both.