Search code examples
oracleplsqloracle11gsql-updatewindow-functions

Oracle PLSQL update query for eliminating duplicate entries using sequence


I have a column ref_key in my table events in oracle database (plsql) where ref_key i am generating from creation_time in format

update events set ref_key= 'EV_'|| TOCHAR(creation_time, 'YYMMDD_HH24MISS');

now the problems is events are generated by batches from files so creation time for many rows can be same. There could be any number of duplicates in this column and I want it to be unique. I already hada primary key, this is non-id field added recently and existing data needs to be updated to have a unique non null human readable value in this ref_key column. I am getting data like column A and i want it like column B

Column A                      Column B                         or 
EV_201005_151610              EV_201005_151610                 EV_201005_151610_1
EV_201005_151610              EV_201005_151610_1               EV_201005_151610_2
EV_201005_151610              EV_201005_151610_2               EV_201005_151610_3
EV_201005_151610              EV_201005_151610_3               EV_201005_151610_4
EV_201005_151610              EV_201005_151610_4               EV_201005_151610_5
EV_201005_151711              EV_201005_151711                 EV_201005_151711_1
EV_201005_151711              EV_201005_151711_1               EV_201005_151711_2
EV_201005_151711              EV_201005_151711_2               EV_201005_151711_3

I dont know how to do it. I can get all distinct values of ref_key where count(ref_key) > 1 . Then can append some sequence to it and reset the sequence after value changes, or anything like that. Or may be my first update query itself. Can Anyone please help with query in achieving this objective.


Solution

  • If you have a primary key column, say id, you could do this with a merge statement:

    merge into events e
    using (
        select 
            id, 
            row_number() over(partition by to_char(creation_time, 'YYMMDD_HH24MISS') order by id) rn,
            count(*)     over(partition by to_char(creation_time, 'YYMMDD_HH24MISS')) cnt
        from events
    ) e1
    on (e1.id = e.id)
    when matched then 
        update set e.ref_key = 'EV_' 
            || to_char(creation_time, 'YYMMDD_HH24MISS') 
            || case when e1.cnt > 1 then '_' || to_char(e1.rn) end
    

    Demo on DB Fiddle:

    Sample data:

    ID | CREATION_TIME       | REF_KEY
    -: | :------------------ | :------
     1 | 2020-10-05 11:03:57 | null   
     2 | 2020-10-05 11:03:57 | null   
     3 | 2020-10-04 11:03:57 | null   
     4 | 2020-10-04 11:03:57 | null   
     5 | 2020-10-04 11:03:57 | null   
     6 | 2020-10-03 11:03:57 | null   
    

    Results:

    ID | CREATION_TIME       | REF_KEY           
    -: | :------------------ | :-----------------
     1 | 2020-10-05 11:03:57 | EV_201005_110357_1
     2 | 2020-10-05 11:03:57 | EV_201005_110357_2
     3 | 2020-10-04 11:03:57 | EV_201004_110357_1
     4 | 2020-10-04 11:03:57 | EV_201004_110357_2
     5 | 2020-10-04 11:03:57 | EV_201004_110357_3
     6 | 2020-10-03 11:03:57 | EV_201003_110357