Search code examples
sqlpostgresqlduplicatessql-delete

Delete duplicates by keeping the cheapest price only


I am working on an product catalog (ecommerce), stored in a PostgreSQL database. I currently have duplicates. I would like to remove those duplicated products by keeping the cheapest one only.

The fields in database that are important :

ID [PK]   SKU   EAN    Price    ....
1         SKU1  123    45.0     ....
2         SKU2  456    36.0     ....
3         SKU3  123    40.0     ....
4         SKU4  789    58.0     ....
5         SKU5  123    38.0     ....
...

I have a SERIAL PRIMARY KEY on the field ID. I have a NOT NULL SKU, a NOT NULL EAN-13 code and a NOT NULL price for each product.

We can see that the EAN "123" is duplicated several times. I would like to find a SQL request that deletes all duplicates (all the line), by keeping only ONE, which would have the lowest price.

We would have :

ID [PK]   SKU   EAN    Price    ....
2         SKU2  456    36.0     ....
4         SKU4  789    58.0     ....
5         SKU5  123    38.0     ....
...

To know : the number of duplicates can be variable. Here is an example with 3 products with the same EAN, but we could have 2, 4, 8 or 587...

So far I've been able to delete the duplicate with the lowest or greatest ID in the case of 2 duplicates only, but it's not what I am trying to find...

    FROM 
        (SELECT Price,
         MIN(Price) OVER( PARTITION BY ean ORDER BY  Price DESC ) AS row_num FROM TABLE ) t
        WHERE t.row_num > 1 );

Solution

  • I would do this using a correlated subquery:

    delete from mytable t
    where t.price > (select min(t2.price) from mytable t2 where t2.sku = t.sku);