Search code examples
sqldb2duplicatesibm-midrange

DB2 SQL - Remove duplicates where some duplicates are valid


I would like to delete duplicate records from XRDK/ENAUDFP_RK, I have tried a 'where exists' clause with the below SQL but this deleted all records in the file rather than just the duplicates.

To give a little background; the issue here is a duplicated daily sales file - it is entirely possible for a store to sell more than 1 of the same item per day, therefore some of the duplicates are not actually duplicates, only every other row is a duplicate.

I.e. if a store sold 2 of them same item, there would be 4 records - 2 genuine sales, 2 duplicates.

This is the SQL I used to highlight the duplicates;

SELECT * FROM                                                     
  (SELECT ROWNUMBER() OVER (                                      
   PARTITION BY                                                   
   INCOMP, INTTYP, INTDAT, INMDAT, INSDAT, INTCCY, INERAT, INDEPT,
   INSTYL, INCOLO, INMATL, INSIZE, INFIT, INVENX, INEAN, INSKU,   
   INBATC, INPSIT, INFSIT, INTSIT, INSUPP, INPONO, INSCCY, INREF1,
   INREF2, INREF3, INSREF, INDREF, INUNIT, INRETL, INCOST, INLRET,
   INLCST, INSCST, INFGRP, INFCAT, INFSEC, INTGRP, INTCAT, INTSEC,
   INSTEN, INSTEV, INSVAT, INPLUL, INSVTF, INSFOR, INSFOV, INVCOD,
   INTO01, INSP01, INSP02, INSP03, INFNUM, INFDES, INRCOD, INRC01,
   INRC02, INRC03, INRC04, INRC05, INRC06, INRC07, INRC08, INRC09,
   INRC10, INRV01, INRV02, INRV03, INRV04, INRV05, INRV06, INRV07,
   INRV08, INRV09, INRV10, INDELD, INCUID, INIREF, INSOUR, INSNUM)
   AS RN, ENAUDFP_RK.*                                            
   FROM   XRDK/ENAUDFP_RK) AS A                                   
 WHERE RN in('2', '4', '6', '8', '10', '12')

Solution

  • You can use rrn for this in DB2

    DELETE FROM XRDK/ENAUDFP_RK f0
    WHERE rrn(f0) in (
     SELECT RW FROM                                                     
     (SELECT ROWNUMBER() OVER (                                      
     PARTITION BY                                                   
     INCOMP, INTTYP, INTDAT, INMDAT, INSDAT, INTCCY, INERAT, INDEPT,
     INSTYL, INCOLO, INMATL, INSIZE, INFIT, INVENX, INEAN, INSKU,   
     INBATC, INPSIT, INFSIT, INTSIT, INSUPP, INPONO, INSCCY, INREF1,
     INREF2, INREF3, INSREF, INDREF, INUNIT, INRETL, INCOST, INLRET,
     INLCST, INSCST, INFGRP, INFCAT, INFSEC, INTGRP, INTCAT, INTSEC,
     INSTEN, INSTEV, INSVAT, INPLUL, INSVTF, INSFOR, INSFOV, INVCOD,
     INTO01, INSP01, INSP02, INSP03, INFNUM, INFDES, INRCOD, INRC01,
     INRC02, INRC03, INRC04, INRC05, INRC06, INRC07, INRC08, INRC09,
     INRC10, INRV01, INRV02, INRV03, INRV04, INRV05, INRV06, INRV07,
     INRV08, INRV09, INRV10, INDELD, INCUID, INIREF, INSOUR, INSNUM)
     AS RN, f1.*, rrn(f1) RW                                            
     FROM   XRDK/ENAUDFP_RK f1) AS A                                   
     WHERE A.RN in('2', '4', '6', '8', '10', '12')
     )