Search code examples
sqlduplicatescaseimpalahue

SQL : Is there any way to find Duplicates and flag them as new column with case


I am trying to find duplicates in the table and wanted to add a new flag column. Sample data below :

Column A        Column B
1                888
1                999
1                777
1                777
2                444
2                444
3                555
4                222
-5               333
-5               672
-5               045        

Desired output :

Column A        Column B         Flag_output
1                888              DUPLICATE
1                999              DUPLICATE
1                777              NULL
1                777              NULL
2                444              NULL
2                444              NULL
3                555              NULL
4                222              NULL
-5               333              DUPLICATE
-5               672              DUPLICATE
-5               045              DUPLICATE

case 1: When Column A has the same values with different values in Column B (e.g. value 1 in Column A) - should be marked as DUPLICATE

case 2: When Column A has the same values with the same values in Column B in multiple rows(e.g. value 2 in column A) - should be marked as NULL

case 3: When Column A and Column B has unique values (e.g. value 3 and 4 in Column A) - Also should be marked as NULL

Any help would be appreciated.


Solution

  • Based on your description, I can phrase your conditions as when the minimum and maximum values of b are different for a, then label as 'duplicate'.

    For this, use window functions:

    select t.*,
           (case when min(b) over (partition by a) <> max(b) over (partition by a)
                 then 'duplicate'
            end) as flag_output
    from t;
    

    Based on the data, you seem to want:

    select t.*,
           (case when count(*) over (partition by a, b) = 1 and
                      count(*) over (partition by a) > 1
                 then 'duplicate'
            end) as flag_output
    from t;
    

    That is, to flag singleton values only when there is more than one value for a.