Search code examples
postgresqlduplicatessql-delete

Remove duplicates based on condition and keep oldest element


Currently, the table is ordered in ascending order by row_number. I need help removing duplicates based on 2 conditions.

  1. If there is a stage, that is online then I want to keep that row, doesn't matter which one, there can be multiple.
  2. If there isn't a row with online for that org_id, then I keep row_number = 1 which would be the oldest element.
sales_id org_id stage row_number
ccc_123 ccc off-line 1
ccc_123 ccc off-line 2
ccc_123 ccc online 3
abc_123 abc off-line 1
abc_123 abc power-off 2
zzz_123 zzz power-off 1

so the table should look like this after:

sales_id org_id stage
ccc_123 ccc online
abc_123 abc off-line
zzz_123 zzz power-off

Looks like this, stackoverflow not working well with second table for some reason


Solution

  • I would use a combination of a CASE statement to modify the rownumber of records with stage='online' and then use ROW_NUMBER to allow me to filter for the lowest value in a group.

    http://sqlfiddle.com/#!17/1421b/5

    create table sales_stage (
        sales_id varchar,
        org_id varchar,
        stage varchar,
        row_num int);
    
    insert into sales_stage (sales_id, org_id, stage, row_num) values 
    ('ccc_123', 'ccc', 'off-line', 1),
    ('ccc_123', 'ccc', 'off-line', 2),
    ('ccc_123', 'ccc', 'online', 3),
    ('abc_123', 'abc', 'off-line', 1),
    ('abc_123', 'abc', 'power-off', 2),
    ('zzz_123', 'zzz', 'power-off', 1);
    
    SELECT 
      sales_id, org_id, stage
    FROM
    (
      SELECT 
        sales_id, org_id, stage, 
        ROW_NUMBER() OVER(PARTITION BY sales_id, org_id ORDER BY row_num) as rn
      FROM (
          SELECT sales_id, org_id, stage, 
          CASE WHEN stage='online' THEN -999 ELSE row_num END as row_num
          FROM sales_stage
        ) x
     ) y
    WHERE rn = 1
    

    tabular results