Search code examples
sqlsql-servercountsubqueryaggregate-functions

Remove duplicates based on a value from a different column


In MS Visual Studio, I have following table 1, where I want to remove duplicate caseNum based on PrimIndicator column. If case number has a duplicate caseNum, I want to keep the row with "True" PrimIndicator, if caseNum is not duplicate then keep the CaseNum.

Table1:

ID  CaseNum PrimIndicator
1   AAA-123 TRUE
2   AAA-123 FALSE
3   BBB-546 
4   CCC-753 TRUE
5   CCC-753 
6   ABC-347 
7   DEF-546 TRUE
8   DEF-346 

Want Table2:

ID  CaseNum PrimIndicator
1   AAA-123 TRUE
3   BBB-546 
4   CCC-753 TRUE
6   ABC-347 
7   DEF-546 TRUE

So far I have tried this but it seems that max() function is not working if the PrimIndicator is blank.

Select ID, distinct CaseNum, Max(PrimIndicator)
from Table1
group by CaseNum, PrimIndicator

Solution

  • If you want only one row per casenum, I would suggest row_number():

    select t.*
    from (select t.*,
                 row_number() over (partition by casenum
                                    order by case when primeindicator = 'true' then 1 else 2 end
                                   ) as seqnum
          from t
         ) t
    where seqnum = 1;
    

    This guarantees one row per casenum in the result set.

    Given your data, you could also use where logic:

    select t.*
    from t
    where t.primeindicator = 'true' or 
          not exists (select 1
                      from t t2
                      where t2.casenum = t.casenum and
                            t2.primeindicator = 'true'
                     );
    

    This does not guarantee one row per casenum -- but it does work for the data you have provided.