Search code examples
postgresqlduplicatesgreatest-n-per-group

PostgreSQL select rows with specific columns identical


I need to get rid of rows that have the same value in only specific columns. For instance, in the extract below, I would like to select all the rows except the last one which is equal to the second-to-last row for columns CODE, START_DATE and TYPE (this means ignoring the value for END_DATE column).

  code         | start_date     | end_date     | type
---------------+----------------+--------------+------
C086000-T10001 | 2014-11-11     | 2014-11-12   | 01
C086000-T10001 | 2014-11-11     | 2014-11-11   | 03
C086000-T10002 | 2014-12-03     | 2014-12-10   | 03
C086000-T10002 | 2014-01-03     | 2014-01-04   | 03
C086000-T10003 | 2012-02-27     | 2014-02-28   | 03
C086000-T10003 | 2014-08-11     | 2014-11-12   | 01
C086000-T10003 | 2014-08-11     | 2014-08-20   | 01

How could I perform this?

Edit: The following query returns a too much columns for a subquery error message:

SELECT * FROM my_table WHERE code NOT IN (SELECT DISTINCT code, start_date, type FROM my_table) ;

Many thanks for help!


Solution

  • This can be done using Postgres' distinct on operator:

    select distinct on (code, start_date, type) code, start_date, end_date, type
    from the_table
    order by code, start_date, type;
    

    If you prefer to use standard SQL, this can also be done using window functions:

    select code, start_date, end_date, type
    from (
        select code, start_date, end_date, type, 
               row_number() over (partition by code, start_date, type order by end_date) as rn
        from the_table
    ) t
    where rn = 1
    order by code, start_date, type;
    

    SQLFiddle example: http://sqlfiddle.com/#!15/c5044/1