Search code examples
oracle-databaseduplicatesrow-numberanalytic-functionsrowid

Need help in understanding Oracle analytic function


I have the following code to detect duplicates in a single table:

      UPDATE tab
        SET dup = 'Y'
      WHERE ROWID IN
        (SELECT tab_o.ROWID
          FROM tab tab_o,
            (SELECT *
             FROM tab tab_i
             WHERE ROWID IN
              (SELECT ROWID
               FROM
                (SELECT ROWID,
                  ROW_NUMBER() OVER(PARTITION BY a, b, c ORDER BY a, b, c) dupl
                 FROM tab
                 WHERE a IS NOT NULL
                   AND a    = 1
                   AND b    = 1
                   AND c    = 3
                )
               WHERE dupl > 1
              )
          ) res
         WHERE tab_o.a  = res.a
          AND tab_o.b   = res.b
          AND tab_o.c   = res.c
        );

I googled so many websites, and found that most of the people follow this efficient way. But no where i found the proper explanation on how these nested queries works.


Solution

  • Make it simplier. No analytics needed.

    Sample table:

    12:57:37 SYSTEM@dwal> create table dupe_test
                       2  (a number, b number, c number, is_dupe char);
    
    Table created.                                                                           
    
    12:57:50 SYSTEM@dwal> insert all                    
    12:57:50   2    into dupe_test values (1, 1, 1, 'n')
    12:57:50   3    into dupe_test values (1, 1, 1, 'n')
    12:57:50   4    into dupe_test values (1, 1, 1, 'n')
    12:57:50   5    into dupe_test values (1, 2, 1, 'n')
    12:57:50   6    into dupe_test values (1, 2, 1, 'n')
    12:57:50   7    into dupe_test values (1, 2, 1, 'n')
    12:57:50   8  select * from dual;                   
    
    6 rows created.                                     
    

    There it is:

    12:58:17 SYSTEM@dwal> select * from dupe_test;
    
             A          B          C I            
    ---------- ---------- ---------- -            
             1          1          1 n            
             1          1          1 n            
             1          1          1 n            
             1          2          1 n            
             1          2          1 n            
             1          2          1 n            
    
    6 rows selected.                              
    

    Unique values:

    12:59:35 SYSTEM@dwal> select rowid,  t.* 
                       2  from dupe_test t 
                       3  where rowid in (select min(rowid) 
                       4                  from dupe_test 
                       5                  group by a, b, c);
    
    ROWID                       A          B          C I                                                                        
    ------------------ ---------- ---------- ---------- -                                                                        
    AAARN1AABAAAO9JAAD          1          2          1 n                                                                        
    AAARN1AABAAAO9JAAA          1          1          1 n                                                                        
    

    Update and result:

    12:59:51 SYSTEM@dwal> update dupe_test t 
                       2  set is_dupe = 'y' 
                       3  where rowid not in (select min(rowid) 
                       4                      from dupe_test 
                       5                      group by a, b, c); 
    
    4 rows updated.                                                                                                                    
    
    13:00:45 SYSTEM@dwal> select * from dupe_test;                                                                                     
    
             A          B          C I                                                                                                 
    ---------- ---------- ---------- -                                                                                                 
             1          1          1 n                                                                                                 
             1          1          1 y                                                                                                 
             1          1          1 y                                                                                                 
             1          2          1 n                                                                                                 
             1          2          1 y                                                                                                 
             1          2          1 y                                                                                                 
    
    6 rows selected.                                                                                                                   
    

    UPDATE:

    What I am trying to do is I found that one entry in table is repeated in the same table, all such entries will be marked with dupl flag, including the original entry

    Still no analytics needed. Just add having count(*) = 1 in your subquery, so you will update only non-unique rows. Having clause is basically a where-condition for aggregate functions without the need to wrap your query in a subquery. It is executed last.

    11:03:00 SYSTEM@dwal> insert into dupe_test values (1,3,1,'n') -- add some unique row
    11:03:09   2  /                                                                      
    
    1 row created.                                                                       
    
    11:03:10 SYSTEM@dwal> update dupe_test set is_dupe = 'y'                             
    11:03:27   2  where rowid not in                                                     
    11:03:34   3  (select min(rowid) from dupe_test                                      
    11:03:51   4  group by a,b,c                                                         
    11:04:00   5  having count(*) = 1);                                                  
    
    6 rows updated.                                                                      
    
    11:04:06 SYSTEM@dwal> select * from dupe_test;                                       
    
             A          B          C I                                                   
    ---------- ---------- ---------- -                                                   
             1          1          1 y                                                   
             1          1          1 y                                                   
             1          1          1 y                                                   
             1          2          1 y                                                   
             1          2          1 y                                                   
             1          2          1 y                                                   
             1          3          1 n                                                   
    
    7 rows selected.