Search code examples
sqlpostgresqlgreatest-n-per-groupamazon-redshift

Comparing different rows in PostgreSQL for each Id


Few columns in my table looks like

Id   Code   date        latest
1    T     2014-10-04    0
2    B     2014-10-19    0
2    B     2014-10-26    0
1    S     2014-10-05    0
1    T     2014-10-06    0
1    T     2014-10-08    1
2    P     2014-10-27    1

I am tracking all changes made by each ID. if there is any change, I insert new row and update the latest value column.

What I want is for each Id, I should be able to find last code where latest is 0. Also, that code should not be equal to existing code(latest = 1) So for id = 1, answer cannot be

 Id   Code
 1    T

as for id = 1 T is existing code (latest = 1).
So ideally my output should look like:

Id    Code
 1     S
 2     B

I think I can get the latest value for code for each id where latest = 0.

But how do I make sure that it should not be equal to existing code value (latest = 1)


Solution

  • Works in Postgres:

    SELECT DISTINCT ON (t0.id)
           t0.id, t0.code
    FROM   tbl t0
    LEFT   JOIN tbl t1 ON t1.code = t0.code
                      AND t1.id = t0.id
                      AND t1.latest = 1
    WHERE  t0.latest = 0
    AND    t1.code IS NULL
    ORDER  BY t0.id, t0.date DESC;
    

    I use the combination of a LEFT JOIN / IS NULL to remove siblings of rows with latest = 1. There are various ways to do this:

    Details for DISTINCT ON:

    Version with CTE and 2x LEFT JOIN

    Since Redshift does not seem to support DISTINCT ON:

    WITH cte AS (
       SELECT t0.*
       FROM   tbl t0
       LEFT   JOIN tbl t1 ON t1.code = t0.code
                         AND t1.id = t0.id
                         AND t1.latest = 1
       WHERE  t0.latest = 0
       AND    t1.id IS NULL
       )
    SELECT c0.id, c0.code
    FROM   cte c0
    LEFT   JOIN cte c1 ON c1.id = c0.id
                      AND c1.date > c0.date
    WHERE  c1.id IS NULL
    ORDER  BY c0.id;
    

    SQL Fiddle showing both.