Search code examples
sqlsql-serverunique

SQL - comparing groups to be unique


I am trying to check uniqueness of sets I created in 'group by' clause.

Let's say I have a table called RelevantKeys:

KeyID | KeyValue | MainID  
1     | a        |  C1  
1     | a        |  C2  
1     | a        |  C3  
2     | b        |  C1  
2     | b        |  C2  
2     | a        |  C3  

The KeyIDs with their values form something like a composite key to identify the MainId. I want to check, whether there exist any duplicates.

In the example example above the expected answer is true, because for C1 and C2 have the same KeyValues for all KeyIds:
C1,C2: (a,b)
C4: (a,a)

However, I want a negative answer for the folowing - e.g. if they don't share the whole composition of values, sharing the same value in the first index does not count as duplicate.

KeyID | KeyValue | MainID  
1     | a        |  C1  
1     | a        |  C3  
2     | b        |  C1  
2     | a        |  C3  

It seems logical to group it, but I don't know how to check for uniqueness of all the rows among all the groups. An aggregate function won't work as I need to compare the whole groups, not just the rows inside each.

SELECT R.MainID
FROM RelevantKeys R
GROUP BY R.MainID

How can I achieve this? Note that the number of KeyIDs is not fixed.
Thanks for your help!


Solution

  • You can use EXCEPT to check for records that exist in one set and not another. By using two such checks you can make sure that all the records in one group match all the records in another group. The following lists all the keys and their matches:

    with mainKeys (MainId) as
    (
      select distinct MainId
      from RelevantKeys
    )
    select k1.MainId as MainId1, k2.MainId as MainId2
    from MainKeys k1
    cross join MainKeys k2
    where k1.MainId != k2.MainId
    and not exists
    (
      select KeyId, KeyValue
      from RelevantKeys r
      where r.MainId = k1.MainId
    
      except
    
      select KeyId, KeyValue
      from RelevantKeys r
      where r.MainId = k2.MainId
    )
    and not exists
    (
      select KeyId, KeyValue
      from RelevantKeys r
      where r.MainId = k2.MainId
    
      except
    
      select KeyId, KeyValue
      from RelevantKeys r
      where r.MainId = k1.MainId
    )
    

    Note that for each match two rows are returned, one for each MainId. Also, this assumes that the ordering doesn't matter and that no key/value pairs are duplicated within a MainId group.

    Sql Fiddle