Search code examples
sqlsql-serversql-server-2014

matching groups of rows in two databases


I have the following (simplified) situation in two databases:

    ID       Prog      T         Qt   
|---------|--------|---------|---------|  
|   a     |    1   |   N     |   100   |  
|   b     |    1   |   Y     |   10    |  
|   b     |    2   |   N     |   90    |  
|   c     |    1   |   N     |   25    |  
|   c     |    2   |   Y     |   25    |  
|   c     |    3   |   Y     |   25    |  
|   c     |    4   |   Y     |   25    |  
|---------|--------|---------|---------|  

    ID       Prog      T         Qt   
|---------|--------|---------|---------|  
|   1     |    1   |   Y     |   10    |  
|   1     |    2   |   N     |   90    |  
|   2     |    1   |   Y     |   100   |  
|   3     |    1   |   Y     |   100   |  
|   4     |    1   |   Y     |    50   |  
|   4     |    2   |   Y     |    25   |  
|   4     |    3   |   Y     |    25   |      
|---------|--------|---------|---------|  

I need to compare groups of rows (primary keys are ID and Prog), to find out which groups of rows represent the same combination of factors (not considering ID).

In the example above, ID "b" in the first table and ID "1" in the second have the same combination of values for Prog, T and Qt, while no one else can be considered exactly the same between the 2 dbs (while ID "2" and "3" in the second table are equal, I'm not interested in comparing in the same db).

I hope I explained everything.


Solution

  • A join and aggregation should work for this purpose:

    select t1.id, t2.id
    from (select t1.*, count(*) over (partition by id) as cnt
          from t1
         ) t1 join
         (select t2.*, count(*) over (partition by id) as cnt
          from t2
         ) t2
         on t1.prog = t2.prog and t1.T = t2.T and t1.Qt = t2.Qt and t1.cnt = t2.cnt
    group by t1.id, t2.id, t1.cnt
    having count(*) = t1.cnt;
    

    This is a little tricky. The subqueries count the number of rows for each id in each table. The on clause gets matches between the three columns -- and checks that the ids have the same count. The group by and having then get rows where number of matching rows is the total number of rows.