Search code examples
sqlpostgresqldistinct-on

Why doesn't my DISTINCT ON expression work?


Query:

SELECT DISTINCT ON (geom_line),gid 
FROM edge_table;

I have a edge table which contains duplicates and I want to remove duplicate edges keeping one of them, but the syntax itself is wrong?


Solution

  • The comma is the problem.

    If you want geom_line included in the result, use

    SELECT DISTINCT ON (geom_line) geom_line, gid FROM edge_table;
    

    Else use

    SELECT DISTINCT ON (geom_line) gid FROM edge_table;
    

    But if your objective is just to remove duplicates, I'd say that you should use

    SELECT DISTINCT geom_line, gid FROM edge_table;
    

    DISTINCT guarantees uniqueness over the whole result set, while DISTINCT ON guarantees uniqueness over the expression in parentheses. If there are several rows where the expression in parentheses is identical, one of these rows is picked. If you have an ORDER BY clause, the first row will be picked.

    DISTINCT a, b is the same as DISTINCT ON (a, b) a, b.