I have the following scenario in a auditing table (AUDIT_TABLE).
t_id e_id detail_log date_created
01 111 USER_1; Salary: from '25' to '30'; Dept_ID: from '001' to ''; 01/01/2019
02 111 USER_1; Dept_ID: from '' to '001'; 01/01/2019
03 001 USER_1; Dept_ID: from '012' to ''; 01/01/2019
04 002 USER_1; Dept_ID: from '555' to '666'; 01/01/2019
05 222 USER_1; Dept_ID: from '' to '123'; 01/02/2019
06 333 USER_1; Salary: from '10' to '20'; Dept_ID: from '200' to ''; 01/03/2019
07 444 USER_1; Salary: from '50' to '100'; Dept_ID: from '' to '10'; 01/04/2019
There's a bug in the batch processes that trigger inserts into this audit table and creates these records "Dept_ID: from .. to
" when there is no change in the "Dept_ID
" value (in another table). There are millions of these records that need to be cleaned up. The records that have multiple field changes captured i.e. item_id 01
needs to be updated to clear the Dept_ID
: audit message and the records that have only the Dept_ID
audit record needs to be deleted (item_id 02
). There may be other pairs that have only the Dept ID
audit message in both the records, in which case both need to be deleted. The trigger logic has been fix so no more false records are being created when there is no actual change in the Dept ID
, however the records already created during the bug need to be cleaned up. There may be rows where there is only one record of the pair in which case these do not need to be updated/deleted because the Dept_ID was actually changed to a null or changed from a null to a value.
So the following should exist after fixing the above data-set:
t_id e_id detail_log date_created
01 111 USER_1; Salary: from '25' to '30'; 01/01/2019
03 001 USER_1; Dept_ID: from '012' to ''; 01/01/2019
04 002 USER_1; Dept_ID: from '555' to '666'; 01/01/2019
05 222 USER_1; Dept_ID: from '' to '123'; 01/02/2019
06 333 USER_1; Salary: from '10' to '20'; Dept_ID: from '200' to ''; 01/03/2019
07 444 USER_1; Salary: from '50' to '100'; Dept_ID: from '' to '10'; 01/04/2019
I have delete and update statements ready to do this but if I delete one record of the pair, then the update won't find the other record since it depends on the deleted record to find it and vice versa for the update statement. I thought to use the merge statement but not sure how to. Any ideas?
So assuming such test data
DETAIL_LOG
-----------------------------------------------------------------------------------------
USER_1; Salary: from '25' to '30'; Dept_ID: from '001' to '';
USER_1; Dept_ID: from '' to '001';
USER_1; Salary: from '25' to '30'; Dept_ID: from '001' to '002'; Prdeel: from '0' to '1':
USER_1; Dept_ID: from '' to '';
if I get you right, you want to remove the Dept_ID
entry from the first and second row because one of the values is NULL. I added forth row with both NULL value which should be removed as well.
The third row remain unchanged, because both value are filled.
You need this regexp to replace the data
q'[(Dept_ID: from '.*' to '';|Dept_ID: from '' to '.*';)]'
Note that the bar in the middle represents OR. The left part matches the departments with to
value NULL, the rigth part the department with from
value NULL.
The matched string is replaced with a NULL string.
To limit the scope of updated records the exact logik of the buggy records must be defined. Here an example where I expect that both records must have the same e_id
and are ordered on t_id
Using LEAD
and LAG
you examines the following and preceeding recods to check if the condition of the change to emtpy and change from empty is fullfilled.
Note that I'm using LIKE
to filter the rows to get better performance.
Query to final check before the update:)
with al as (
select T_ID, E_ID, DETAIL_LOG,
lead(DETAIL_LOG) over (partition by e_id order by t_id) DETAIL_LOG_LEAD,
lag(DETAIL_LOG) over (partition by e_id order by t_id) DETAIL_LOG_LAG
from AUDIT_TABLE)
select T_ID, E_ID,
/* updated entry */
regexp_replace(detail_log, q'[(Dept_ID: from '.*' to '';|Dept_ID: from '' to '.*';)]', '') DETAIL_LOG
from al
where (DETAIL_LOG like q'[%Dept_ID: from '_%' to '';%]' and /* first wrong record */
DETAIL_LOG_LEAD like q'[%Dept_ID: from '' to '_%';%]') OR
(DETAIL_LOG like q'[%Dept_ID: from '' to '_%';%]' and /* second wrong record */
DETAIL_LOG_LAG like q'[%Dept_ID: from '_%' to '';%]')
;
returns
T_ID E_ID DETAIL_LOG
---------- ---------- ------------
1 111 USER_1; Salary: from '25' to '30';
2 111 USER_1;
Update
The UPDATE
statement is a simple reformulation of the query above using IN
(subquery) to limit the scope.
update AUDIT_TABLE
set DETAIL_LOG = regexp_replace(detail_log, q'[(Dept_ID: from '.*' to '';|Dept_ID: from '' to '.*';)]', '')
where (T_ID, E_ID) in
-- query from above that limits the updated rows
To delete the empty audit records after this cleanup is a trivial step.