Search code examples
sqlt-sqldelta

Comparing differences (error) between two tables in SQL


I want to calculate a delta between the rows of two tables with the exact same structure, but not necessarily the same data.

table 1 (predictions)

Id | Name | GP | G | A

table 1 (actual data)

Id | Name | GP | G | A

result (matched by Id):

Id | (GP1-GP2) AS DeltaGP | (G1-G2) AS DeltaG | (A1-A2) AS DeltaA |

Catch my drift? This is to caclulate the error of a prediction model in SSAS

Here is my code:

    SELECT P.[Player_id]
          ,P.[PlayerName]
          ,sum(P.[Games Played])-sum(S.[GamesPlayed]) AS GP
          ,sum(P.[Total Goals])-sum(s.[TotalGoals]) AS Goals
          ,sum(P.[Total Assists])-sum(s.[TotalAssists]) AS Assists   
FROM [PredictionsPlayersHistory] as P   
JOIN [V_StatsPlayers] AS S on p.pLAYER_id = s.Player_id    
where S.LastPlayedDate >= '2010-02-01' and P.Player_id
    = 8471675   group by P.[Player_id],P.[PlayerName]   
order by Goals desc, Assists desc, GP asc

problem is the results arent right:

Predictions (SUM)

PlayerName GP Goals Assists

Sidney Crosby 56 28 34

Actual data (SUM)

PlayerName GP Goals Assists

Sidney Crosby 26 17 24

Results

Sidney Crosby 28 -42 -98


Solution

  • You need to join the two tables on the id column:

    SELECT a.Id,a.Name, a.GP - p.GP AS DeltaGP, a.G - p.G AS DeltaG,  a.A - p.A AS DeltaA
    FROM dbo.table1 AS a --actual values
    JOIN dbo.table2 AS p --predictions
    ON a.Id = p.Id;
    

    For an introduction to SQL Server Joins check out my post series here: http://sqlity.net/en/1146/a-join-a-day-introduction/


    EDIT: If you have multiple rows per player in each table you need to aggregate them before you join:

    SELECT  a.Id,
            a.Name,
            a.GP - p.GP AS DeltaGP,
            a.G - p.G AS DeltaG,
            a.A - p.A AS DeltaA
    FROM    (
              SELECT  Id,
                      Name,
                      SUM(GP) AS GP,
                      SUM(G) AS G,
                      SUM(A) AS A
              FROM    dbo.table1
              GROUP BY Id,
                      Name
            ) AS a --actual values
    JOIN    (
              SELECT  Id,
                      SUM(GP) AS GP,
                      SUM(G) AS G,
                      SUM(A) AS A
              FROM    dbo.table2
              GROUP BY Id
            ) AS p
            --predictions
      ON a.Id = p.Id;
    

    Depending on you requirements you might want to use AVG instead of SUM.