Search code examples
postgresqlduplicatesprimary-keydelete-rowdbeaver

Duplicates - Deletion under PostgreSQL


I am working on tables where primary keys are sometimes not defined.

I have observed the following behavior while trying to delete duplicates using the row deletion button of the interface: Deletion of one row (1 of the duplicate) leads to deletion of all similar rows (loss of the data) under Navicat.

Example: In table table_test containing 2 duplicates Deletion of row 1 leads to deletion of rows 1 and 2.

table_test

a b c
A A A
A A A
B B B
B B B

Q1: Could you explain me the behavior of the database when there is no primary key? Why are those rows treated similarly internally? (I know that there is a link with the fact that there is no unique identifier but more precisely.)

Q2: Under DBeaver, it is possible to delete the duplicates in defining a virtual primary key on all columns. How does the virtual primary key work? It is not possible to define a primary key on columns (a, b, c) due to the presence of duplicates. How is the 'virtual' primary key created?


Solution

  • I would question Q2; DBeaver does not do that on its own. The following gives does what you are after (run in DBeaver - but does not depend on it). (see demo). It generates a virtual id for each column combination using the Window Function row_number(), and selects the ctid for each of those rows. Note the ctid is guaranteed to be unique. (See demo)

    delete 
      from test_table 
     where ctid in (select ctid 
                      from (select ctid
                                 , row_number() over (partition by a, b, c) rn 
                              from test_table
                        ) gctid
                    where rn > 1
                   );