I am trying to find duplicates in the table and wanted to add a new flag column. Sample data below :
Column A Column B
1 888
1 999
1 777
1 777
2 444
2 444
3 555
4 222
-5 333
-5 672
-5 045
Desired output :
Column A Column B Flag_output
1 888 DUPLICATE
1 999 DUPLICATE
1 777 NULL
1 777 NULL
2 444 NULL
2 444 NULL
3 555 NULL
4 222 NULL
-5 333 DUPLICATE
-5 672 DUPLICATE
-5 045 DUPLICATE
case 1: When Column A has the same values with different values in Column B (e.g. value 1 in Column A) - should be marked as DUPLICATE
case 2: When Column A has the same values with the same values in Column B in multiple rows(e.g. value 2 in column A) - should be marked as NULL
case 3: When Column A and Column B has unique values (e.g. value 3 and 4 in Column A) - Also should be marked as NULL
Any help would be appreciated.
Based on your description, I can phrase your conditions as when the minimum and maximum values of b
are different for a
, then label as 'duplicate'
.
For this, use window functions:
select t.*,
(case when min(b) over (partition by a) <> max(b) over (partition by a)
then 'duplicate'
end) as flag_output
from t;
Based on the data, you seem to want:
select t.*,
(case when count(*) over (partition by a, b) = 1 and
count(*) over (partition by a) > 1
then 'duplicate'
end) as flag_output
from t;
That is, to flag singleton values only when there is more than one value for a
.