I have the following code to detect duplicates in a single table:
UPDATE tab
SET dup = 'Y'
WHERE ROWID IN
(SELECT tab_o.ROWID
FROM tab tab_o,
(SELECT *
FROM tab tab_i
WHERE ROWID IN
(SELECT ROWID
FROM
(SELECT ROWID,
ROW_NUMBER() OVER(PARTITION BY a, b, c ORDER BY a, b, c) dupl
FROM tab
WHERE a IS NOT NULL
AND a = 1
AND b = 1
AND c = 3
)
WHERE dupl > 1
)
) res
WHERE tab_o.a = res.a
AND tab_o.b = res.b
AND tab_o.c = res.c
);
I googled so many websites, and found that most of the people follow this efficient way. But no where i found the proper explanation on how these nested queries works.
Make it simplier. No analytics needed.
Sample table:
12:57:37 SYSTEM@dwal> create table dupe_test
2 (a number, b number, c number, is_dupe char);
Table created.
12:57:50 SYSTEM@dwal> insert all
12:57:50 2 into dupe_test values (1, 1, 1, 'n')
12:57:50 3 into dupe_test values (1, 1, 1, 'n')
12:57:50 4 into dupe_test values (1, 1, 1, 'n')
12:57:50 5 into dupe_test values (1, 2, 1, 'n')
12:57:50 6 into dupe_test values (1, 2, 1, 'n')
12:57:50 7 into dupe_test values (1, 2, 1, 'n')
12:57:50 8 select * from dual;
6 rows created.
There it is:
12:58:17 SYSTEM@dwal> select * from dupe_test;
A B C I
---------- ---------- ---------- -
1 1 1 n
1 1 1 n
1 1 1 n
1 2 1 n
1 2 1 n
1 2 1 n
6 rows selected.
Unique values:
12:59:35 SYSTEM@dwal> select rowid, t.*
2 from dupe_test t
3 where rowid in (select min(rowid)
4 from dupe_test
5 group by a, b, c);
ROWID A B C I
------------------ ---------- ---------- ---------- -
AAARN1AABAAAO9JAAD 1 2 1 n
AAARN1AABAAAO9JAAA 1 1 1 n
Update and result:
12:59:51 SYSTEM@dwal> update dupe_test t
2 set is_dupe = 'y'
3 where rowid not in (select min(rowid)
4 from dupe_test
5 group by a, b, c);
4 rows updated.
13:00:45 SYSTEM@dwal> select * from dupe_test;
A B C I
---------- ---------- ---------- -
1 1 1 n
1 1 1 y
1 1 1 y
1 2 1 n
1 2 1 y
1 2 1 y
6 rows selected.
UPDATE:
What I am trying to do is I found that one entry in table is repeated in the same table, all such entries will be marked with dupl flag, including the original entry
Still no analytics needed. Just add having count(*) = 1
in your subquery, so you will update only non-unique rows. Having
clause is basically a where-condition for aggregate functions without the need to wrap your query in a subquery. It is executed last.
11:03:00 SYSTEM@dwal> insert into dupe_test values (1,3,1,'n') -- add some unique row
11:03:09 2 /
1 row created.
11:03:10 SYSTEM@dwal> update dupe_test set is_dupe = 'y'
11:03:27 2 where rowid not in
11:03:34 3 (select min(rowid) from dupe_test
11:03:51 4 group by a,b,c
11:04:00 5 having count(*) = 1);
6 rows updated.
11:04:06 SYSTEM@dwal> select * from dupe_test;
A B C I
---------- ---------- ---------- -
1 1 1 y
1 1 1 y
1 1 1 y
1 2 1 y
1 2 1 y
1 2 1 y
1 3 1 n
7 rows selected.