sqloracleplsqlcursorprocedure

Compare two tables and update the mismatch column in Oracle SQL


I have a requirement to compare with one table and update another table if there are mismatches. Table LEGACY_CUST_INFO has correct CUST_PRIVILEGE_NUMBER. So, I need to findout the wrong CUST_PRIVILEGE_NUMBER in second table CUST_INFO and update with the right CUST_PRIVILEGE_NUMBER from first table LEGACY_CUST_INFO. ROW_ID in both the tables match are an exact match. Both tables have approximately 20 million records.

Below are the two table structures

Table 1 Name : LEGACY_CUST_INFO


ROW_ID  CUST_PRIVILEGE_NUMBER

Table 2 Name : CUST_INFO

ROW_ID CUST_PRIVILEGE_NUMBER

I am getting performance issues when I check with the below statements. Is there a better way to create a table with mismatch records and update the column CUST_PRIVILEGE_NUMBER in CUST_INFO table ? Below is my code and logic.

  1. I am creating a table - EXACT_RECORDS with exact match records.
  2. Deleting the exact match records from LEGACY_CUST_INFO table.
  3. Updating the CUST_PRIVILEGE_NUMBER based on LEGACY_CUST_INFO table.
CREATE TABLE EXACT_RECORDS
AS
(SELECT A.* FROM LEGACY_CUST_INFO A, CUST_INFO B WHERE A.ROW_ID = B.ROW_ID AND A.CUST_PRIVILEGE_NUMBER = B.CUST_PRIVILEGE_NUMBER);
DELETE FROM LEGACY_CUST_INFO WHERE ROW_ID IN (SELECT ROW_ID FROM EXACT_RECORDS);
UPDATE CUST_INFO B SET B.CUST_PRIVILEGE_NUMBER = (SELECT A.CUST_PRIVILEGE_NUMBER FROM LEGACY_CUST_INFO A WHERE A.ROW_ID = B.ROW_ID);

I know this logic is very basic. Could you help in better way so that 20 millions records are updated in less than 30 minutes ?


Solution

  • I suggest to using a merge statement containing a query, returning prepared not matched records for update.

    merge into cust_info dst
    using (
    select l.row_id, l.cust_privilege_number
    from legacy_cust_info lci
    inner join cust_info ci on ci.row_id = lci.row_id
    where ci.cust_privilege_number != lci.cust_privilege_number
    ) src
    on (dst.row_id = src.row_id)
    when matched then 
      update set dst.cust_privilege_number = src.cust_privilege_number;
    

    Also consider create indexes for your tables:

    CREATE INDEX legacy_cust_info#rid#cpn#idx ON LEGACY_CUST_INFO (row_id, cust_privilege_number);
    CREATE INDEX cust_info#row_id#idx ON CUST_INFO (row_id);
    

    UPD. You can estimate the time by running the nested select without merging its contents into the destination table:

    select count(1) from (
      select l.row_id, l.cust_privilege_number
       from legacy_cust_info lci
       inner join cust_info ci on ci.row_id = lci.row_id
       where ci.cust_privilege_number != lci.cust_privilege_number
    )
    

    UPD2. To considering NULLs in ci.cust_privilege_number simply add alternative condition into the query:

    where (
      ci.cust_privilege_number != lci.cust_privilege_number 
      or (
       ci.cust_privilege_number is null 
       and lci.cust_privilege_number is not null
      )
    )