Search code examples
sqlpostgresqlgreenplum

Insert into table unique rows by postgresql


I have some code stats in greenplum table A

| id  | file   | repo | lang | line |
-------------------------------------
| a   | /a.txt | r1   | txt  | 3    |
| a   | /b.c   | r1   | c    | 5    |
| b   | /x.java| r1   | java | 33   |
| c   | /f.cpp | r2   | c++  | 23   |
| a   | /a.txt | r3   | txt  | 3    |
| a   | /b.c   | r3   | c    | 5    |

but the last two rows code indicate this code is come form repo r1, because the commit id is same with first two rows. I want to remove the duplicate rows, and insert result to table B:

| id  | file   | repo | lang | line |
-------------------------------------
| a   | /a.txt | r1   | txt  | 3    |
| a   | /b.c   | r1   | c    | 5    |
| b   | /x.java| r1   | java | 33   |
| c   | /f.cpp | r2   | c++  | 23   |

the row can be distinct by: id + file + repo

Thanks in advance.


Solution

  • You can use NOT EXISTS to check that a duplicate does not exist:

    SELECT *
    FROM t
    WHERE NOT EXISTS (
        SELECT 1
        FROM t AS x
        WHERE x.id   = t.id
        AND   x.file = t.file
        AND   x.repo < t.repo
    )
    

    SQL Fiddle