The table is loaded every night with some data (around 100K rows). Then my batch job reads each record from the table and processes it. My requirement is to only process those records who have changes into specific columns because there is no point in processing a new record if it does not have any changes compared to yesterday's record (and it unnecessary delays whole processing of the job).
Please assume below table structure
Table Name:
STAGING_T
Column Names:
PK_1 | COL1 | COL2 | COL3 | COL4 | CREATE_DATE | FLAG
CREATE_DATE is the sysdate when the data is loaded into this table. This is used to identify day wise data loaded into this table.
Common columns based on which I identify the same records of yesterday and today are COL1 and COL2.
**Sample Data**
PK_1 COL1 COL2 COL3 COL4 CREATE_DATE FLAG
1 1000 2000 a x 31.01.2019
2 1000 2001 b y 31.01.2019
3 1000 2002 c z 31.01.2019
4 1000 2000 aa x 30.01.2019
5 1000 2001 b y 30.01.2019
6 1000 2003 d z 30.01.2019
**Expected Output**
PK_1 COL1 COL2 COL3 COL4 CREATE_DATE FLAG
1 1000 2000 a x 31.01.2019 UPDATE
2 1000 2001 b y 31.01.2019 NO CHANGE
3 1000 2002 c z 31.01.2019 INSERT
4 1000 2000 aa x 30.01.2019
5 1000 2001 b y 30.01.2019
6 1000 2003 d z 30.01.2019
Any help or suggestions would be great. Thank you.
You can first generate the desired values and then use merge statement. You may try below -
MERGE INTO DATA T
USING (SELECT D.PK_1, D.COL1, D.COL2, D.COL3, CREATE_DATE, D.COL4,
CASE WHEN CREATE_DATE = (SELECT MAX(CREATE_DATE) FROM DATA) THEN
CASE WHEN LAG(COL3 || '`'||COL4) OVER(PARTITION BY COL1,COL2 ORDER BY CREATE_DATE) IS NULL
THEN 'INSERT'
WHEN LAG(COL3 || '`'||COL4) OVER(PARTITION BY COL1,COL2 ORDER BY CREATE_DATE) = COL3 || '`'||COL4
THEN 'NO CHANGE'
ELSE 'UPDATE'
END
END FLAG
FROM DATA D)
S ON (T.PK_1 = S.PK_1)
WHEN MATCHED THEN UPDATE SET T.FLAG = S.FLAG;
Here is the demo.