I am working on an product catalog (ecommerce), stored in a PostgreSQL database. I currently have duplicates. I would like to remove those duplicated products by keeping the cheapest one only.
The fields in database that are important :
ID [PK] SKU EAN Price ....
1 SKU1 123 45.0 ....
2 SKU2 456 36.0 ....
3 SKU3 123 40.0 ....
4 SKU4 789 58.0 ....
5 SKU5 123 38.0 ....
...
I have a SERIAL PRIMARY KEY on the field ID. I have a NOT NULL SKU, a NOT NULL EAN-13 code and a NOT NULL price for each product.
We can see that the EAN "123" is duplicated several times. I would like to find a SQL request that deletes all duplicates (all the line), by keeping only ONE, which would have the lowest price.
We would have :
ID [PK] SKU EAN Price ....
2 SKU2 456 36.0 ....
4 SKU4 789 58.0 ....
5 SKU5 123 38.0 ....
...
To know : the number of duplicates can be variable. Here is an example with 3 products with the same EAN, but we could have 2, 4, 8 or 587...
So far I've been able to delete the duplicate with the lowest or greatest ID in the case of 2 duplicates only, but it's not what I am trying to find...
FROM
(SELECT Price,
MIN(Price) OVER( PARTITION BY ean ORDER BY Price DESC ) AS row_num FROM TABLE ) t
WHERE t.row_num > 1 );
I would do this using a correlated subquery:
delete from mytable t
where t.price > (select min(t2.price) from mytable t2 where t2.sku = t.sku);