Search code examples
teradatadelete-row

Delete only one column from the Target table even if there are multiple similar columns (Teradata)


I came across a situation where i have to delete a column from a table based on a condition from other table Let me break it down to you!

There is a master table called MORTALITY (containing info regarding deceased individuals) And another table called INC_MORTALITY (incremental mortality) table which is refreshed on a weekly basis Note: Both the tables have similar format

So this week’s new records, containing both additional deceased individuals as well as updates of old data for previously delivered records. This is a single file with a column (OP_DIRECTIVE) specifying if it is an “add” or “delete” record.

Processing Weekly Files To incorporate the weekly update file, we need to execute the following steps in order. 1. Delete rows in the master table which have a OP_DIRECTIVE = 'D' as the operation in the weekly update. For a given delete row, you should delete a single row in the master table which matches the delete record on all fields aside from the “D” operation column. Warning: please ensure you only delete, or mark as deleted, one record, even if more than one historical record fully matches this new delete record. 
 2. Add rows in the master table which appear in the “Add” file. 
 Upon completion of these steps, your master table should be the most up to date master of deaths.

 (Note: THESE TABLES DOES NOT HAVE PRIMARY KEYS) SO WHAT I TRIED:

DEL FROM MORTALITY MI
WHERE MI.DATA_SOURCE    = INC_MORTALITY.DATA_SOURCE
AND MI.DD_IMP_FLAG      = INC_MORTALITY.DD_IMP_FLAG
AND MI.DOB              = INC_MORTALITY.DOB
AND MI.DOD                = INC_MORTALITY.DOD
AND MI.DEATH_VERIFICATION = INC_MORTALITY.DEATH_VERIFICATION
AND MI.GENDER_PROBABILITY = INC_MORTALITY.GENDER_PROBABILITY
AND MI.GENDER           = INC_MORTALITY.GENDER
AND MI.TOKEN_1          = INC_MORTALITY.TOKEN_1
AND MI.TOKEN_2          = INC_MORTALITY.TOKEN_2
AND MI.TOKEN_4          = INC_MORTALITY.TOKEN_4
AND MI.TOKEN_5          = INC_MORTALITY.TOKEN_5
AND MI.TOKEN_7          = INC_MORTALITY.TOKEN_7
AND MI.TOKEN_16         = INC_MORTALITY.TOKEN_16
AND MI.TOKEN_KEY        = INC_MORTALITY.TOKEN_KEY
AND INC_MORTALITY.OP_DIRECTIVE = 'D'

The above Delete statement will delete all the rows satisfying the conditions, my requirement is to delete only one record even if more than one historical record fully matches this new delete record, and if i include ROW NUMBER() stmt like below my DELETE stmt is not working

QUALIFY ROW_NUMBER() OVER (PARTITION BY MI.DATA_SOURCE,MI.DOB,MI.DOD
ORDER BY MI.DOD DESC ) = 1

Any suggestions on how to approach this scenario, Thanks!!


Solution

  • Approach to solution: Copy unmatched rows to a work table, then truncate the original table and replace with contents of the work table. One way to identify unmatched rows would be to tag each of the input rows in a set of duplicates with a unique number, something like this:

    INSERT work_table SELECT MI.col1, MI.col2, ...
    FROM 
      (SELECT M.*,
       ROW_NUMBER() OVER (PARTITION BY <join cols> ORDER BY <some col(s)>) AS ROWNUM
       FROM MORTALITY M) MI
    LEFT JOIN 
      (SELECT I.*, 
       ROW_NUMBER() OVER (PARTITION BY <join cols> ORDER BY <some col(s)>) AS ROWNUM
       FROM INC_MORTALITY I
       WHERE OP_DIRECTIVE='D') INC
    ON MI.join_col1 = INC.join_col1
    AND MI.join_col2 = INC.join_col2
    ...
    AND MI.ROWNUM = INC.ROWNUM
    WHERE INC.ROWNUM IS NULL /* "anti-join" keeps only unmatched rows */
    ;
    DELETE FROM MORTALITY;
    INSERT MORTALITY SELECT * FROM work_table;
    

    If INC_MORTALILTY never has duplicates, then you can eliminate numbering that relation and change the last join condition to MI.ROWNUM = 1 and use one of the other JOIN columns for the NULL check.