Search code examples
sqloracleduplicatesoracle-data-integrator

Delete Duplicate in I$ table in ODI


We have a load plan in ODI. We get a repeating error in some on our scenarios due to duplicate records in the I% table. What we do is manually run the script every time the load plan fails.

DELETE FROM adw12_dw. I$_1558911580_4
      WHERE     (EFFECTIVE_FROM_DT, DATASOURCE_NUM_ID, INTEGRATION_ID) IN
                    (  SELECT EFFECTIVE_FROM_DT,
                              DATASOURCE_NUM_ID,
                              INTEGRATION_ID
                         FROM adw12_dw . I$_1558911580_4
                     GROUP BY EFFECTIVE_FROM_DT,
                              DATASOURCE_NUM_ID,
                              INTEGRATION_ID
                       HAVING COUNT (1) > 1)
            AND ROWID NOT IN
                    (  SELECT MIN (ROWID)
                         FROM adw12_dw . I$_1558911580_4
                     GROUP BY EFFECTIVE_FROM_DT,
                              DATASOURCE_NUM_ID,
                              INTEGRATION_ID
                       HAVING COUNT (1) > 1)
                
                commit;

Is there a way to automate the deletion of duplicate records in the Integration table?


Solution

  • If you have duplicates in the source, best would be to handle that in the logic of the mapping. What could work is to add an expression component to add a row_rank column using an analytical function to rank the duplicates : row_number() over (partition by EFFECTIVE_FROM_DT, DATASOURCE_NUM_ID, INTEGRATION_ID order by ROWID). You can then add a filter with the condition row_rank = 1.

    If you prefer to do a delete after inserting, you can edit the IKM and add the delete step before loading the target table.

    You could also divide the integration in 3 different steps :

    1. a mapping that would load a staging table instead of your final target table, with the duplicate
    2. an ODI procedure that would perform the delete to remove the duplicates in the staging table
    3. a mapping that would load the data from the staging area to the target table