Search code examples
teradatasql-insertsql-merge

Insert non-duplicates to master table using MERGE/INSERT from Incremental table in Teradata


I came across a situation where there is a master table called MORTALITY (containing info regarding deceased individuals) And another table called INC_MORTALITY (incremental mortality) table which is refreshed on a weekly basis Note: Both the tables have similar format

So INC_MORTALITY has this week’s new records, containing both additional deceased individuals as well as updates of old data for previously delivered records. This is a single file with a column (OP_DIRECTIVE) specifying if it is an “add” or “delete” record.

Processing Weekly Files To incorporate the weekly update file: we need to Add rows in the master table which have OP_DIRECTIVE = 'A' as the operation in the weekly update. I am currently using the below stmt to INSERT the records

INSERT INTO db.MORTALITY
    SELECT
    DATA_SOURCE,
    OP_DIRECTIVE,
    DD_IMP_FLAG,
    DOB,
    DOD,
    DEATH_VERIFICATION,
    GENDER_PROBABILITY,
    GENDER,
    TOKEN_1,
    TOKEN_2,
    TOKEN_4,
    TOKEN_5,
    TOKEN_7,
    TOKEN_16,
    TOKEN_KEY
    FROM db.INC_MORTALITY INC
    WHERE INC.OP_DIRECTIVE = 'A';
--The above Query fetches all the records to be added to the Master Table - MORTALITY(My Requirement) and for some reason if INC_MORTALITY is not refreshed after a week i will be updating the same records again to the master file.

Now have I updated primary index (token_1, token_2) in my table definition, i know it will throw an ERROR if we are trying to insert records with the same values of token_1 & token_2 but this will be an automated process and my job will FAIL if there is an ERROR so, i want it to check PRIOR to the Insert process if the records are already inserted, so that it will skip inserting duplicates like using IFNOTEXISTS stmt

Adding to the above: Can i use MERGE WHEN NOT MATCHED THEN INSERT operation here like below:

MERGE db.MORTALITY B
USING (SELECT
    DATA_SOURCE,
    OP_DIRECTIVE,
    DD_IMP_FLAG,
    DOB,
    DOD,
    DEATH_VERIFICATION,
    GENDER_PROBABILITY,
    GENDER,
    TOKEN_1,
    TOKEN_2,
    TOKEN_4,
    TOKEN_5,
    TOKEN_7,
    TOKEN_16,
    TOKEN_KEY
    FROM db.INC_MORTALITY
    WHERE OP_DIRECTIVE = 'A') A
    ON  A.TOKEN_1 = B.TOKEN_1
    AND A.TOKEN_2 = B.TOKEN_2
WHEN NOT MATCHED THEN
INSERT (A.DATA_SOURCE,A.OP_DIRECTIVE,A.DD_IMP_FLAG,A.DOB,A.DOD,A.DEATH_VERIFICATION,A.GENDER_PROBABILITY,A.GENDER,A.TOKEN_1,A.TOKEN_2,A.TOKEN_4,A.TOKEN_5,A.TOKEN_7,A.TOKEN_16,A.TOKEN_KEY);

The above Query is executing fine in Teradata Studio but not inserting any records from INC_MORTALITY TO MORTALITY(master) table, am I doing anything wrong?

UPDATE:


DROP TABLE IF EXISTS MORTALITY;


CREATE MULTISET  TABLE MORTALITY,
    NO BEFORE JOURNAL,
    NO AFTER JOURNAL,
    NO FALLBACK,
    CHECKSUM = DEFAULT
(

    DATA_SOURCE          VARCHAR(25) NULL 
        TITLE 'Data Source',
    OP_DIRECTIVE         VARCHAR(25) NULL 
        TITLE 'Operation Directive',
    DD_IMP_FLAG          INTEGER NULL 
        TITLE 'Death Date Imputation Flag',
    DOB                  DATE NULL 
        TITLE 'Date Of Birth',
    DOD                  DATE NULL 
        TITLE 'Date Of Death',
    DEATH_VERIFICATION   INTEGER NULL 
        TITLE 'Death Verification',
    GENDER_PROBABILITY   FLOAT NULL 
        TITLE 'Gender Probability Score',
    GENDER               VARCHAR(25) NULL 
        TITLE 'Gender',
    TOKEN_1              VARCHAR(44) NULL 
        TITLE 'Token 1',
    TOKEN_2              VARCHAR(44) NULL 
        TITLE 'Token 2',
    TOKEN_4              VARCHAR(44) NULL 
        TITLE 'Token 4',
    TOKEN_5              VARCHAR(44) NULL 
        TITLE 'Token 5',
    TOKEN_7              VARCHAR(44) NULL 
        TITLE 'Token 7',
    TOKEN_16             VARCHAR(44) NULL 
        TITLE 'Token 16',
    TOKEN_KEY            VARCHAR(44) NULL 
        TITLE 'Token Key'
)
    PRIMARY INDEX MORTALITY_IX1
     (
            TOKEN_1,
            TOKEN_2
     );

This is the Table DDL for MORTALITY and INC_MORTALITY also has similar format. The combination of both the columns TOKEN_1,TOKEN_2 identify a unique record and as you can see above all of them are NULLABLE columns As i previously stated my very first INSERT stmt serves the purpose but i just want to avoid getting an error like this: Duplicate unique prime key error in db.MORTALITY, if the records are already inserted I hope the information helps, Thanks


Solution

  • I've been doing SOME research myself and figured out the below approaches:

    1) Using INSERT This query worked for me, it inserts records only which are in INC_MORTALITY but not in MORTALITY (avoiding duplicates)

        INSERT INTO db.MORTALITY 
        SELECT * FROM db.INC_MORTALITY INC
        WHERE INC.OP_DIRECTIVE = 'A'
        EXCEPT
        SELECT * FROM db.MORTALITY_INDEX
    

    2) Similarly using MERGE

    MERGE db.MORTALITY MI
    USING (SELECT
        DATA_SOURCE,
        OP_DIRECTIVE,
        DD_IMP_FLAG,
        DOB,
        DOD,
        DEATH_VERIFICATION,
        GENDER_PROBABILITY,
        GENDER,
        TOKEN_1,
        TOKEN_2,
        TOKEN_4,
        TOKEN_5,
        TOKEN_7,
        TOKEN_16,
        TOKEN_KEY
        FROM db.INC_MORTALITY
        WHERE OP_DIRECTIVE = 'A') INC
        ON  MI.DATA_SOURCE        = INC.DATA_SOURCE
        AND MI.DD_IMP_FLAG        = INC.DD_IMP_FLAG
        AND MI.DOB                = INC.DOB
        AND MI.DOD                = INC.DOD
        AND MI.DEATH_VERIFICATION = INC.DEATH_VERIFICATION
        AND MI.GENDER_PROBABILITY = INC.GENDER_PROBABILITY
        AND MI.GENDER             = INC.GENDER
        AND MI.TOKEN_1            = INC.TOKEN_1
        AND MI.TOKEN_2            = INC.TOKEN_2
        AND MI.TOKEN_4            = INC.TOKEN_4
        AND MI.TOKEN_5            = INC.TOKEN_5
        AND MI.TOKEN_7            = INC.TOKEN_7
        AND MI.TOKEN_16           = INC.TOKEN_16
        AND MI.TOKEN_KEY          = INC.TOKEN_KEY
    WHEN NOT MATCHED THEN
    INSERT (DATA_SOURCE,
            OP_DIRECTIVE,
            DD_IMP_FLAG,
            DOB,
            DOD,
            DEATH_VERIFICATION,
            GENDER_PROBABILITY,
            GENDER,
            TOKEN_1,
            TOKEN_2,
            TOKEN_4,
            TOKEN_5,
            TOKEN_7,
            TOKEN_16,
            TOKEN_KEY) 
    VALUES (INC.DATA_SOURCE,
            INC.OP_DIRECTIVE,
            INC.DD_IMP_FLAG,
            INC.DOB,
            INC.DOD,
            INC.DEATH_VERIFICATION,
            INC.GENDER_PROBABILITY,
            INC.GENDER,
            INC.TOKEN_1,
            INC.TOKEN_2,
            INC.TOKEN_4,
            INC.TOKEN_5,
            INC.TOKEN_7,
            INC.TOKEN_16,
            INC.TOKEN_KEY);
    

    Thanks to everyone who contributed to my question!!