Search code examples
sqlsuminner-joinaggregate-functionsfull-outer-join

SQL - How to join 2 tables with combined SUM function ? (Incorrect results currently)


In SQL I'm trying to combine multiple tables and grab the SUM of expenses per person, and sort those by highest total expense first. I have 3 tables:

  • test1 (from grocery store #1)
  • test2 (from grocery store #2),
  • junction1 (one that I just created to somehow try to connect test1 and test2 together)

I cannot edit test1 and test2 in the production environment. I created junction1 as a bridge to connect test1 and test2. I can modify columns/content in junction1. The IDs of test1 and test2 may change in the future (right now they are the same).

Desired result:

Desired result table

I need to do a full join on all tables, since I want to include all personnel from both tables. test1 and test2 are independent, as some people only shop in test1 locations and some only shop in test2 locations. Also to sort by Total SUM of both tables I tried:

ORDER BY SUM(Grocery1 + Grocery2) DESC

No luck.

A SUM select statement (no joins) works:

select junction1.Name1, SUM(Amount) AS Grocery1 
from test1 
FULL JOIN junction1 on junction1.ID1= test1.ID1 
GROUP BY junction1.Name1 ORDER BY Grocery1 DESC;

Part 1

But when I join the table(s):

select junction1.Name1, SUM(test1.Amount) AS Grocery1, SUM(test2.Amount) AS Grocery2
from test1
FULL JOIN junction1 ON test1.ID1 = junction1.ID1
FULL JOIN test2 ON test2.ID2 = junction1.ID2
GROUP BY junction1.Name1

It gives:

Incorrect Results

The data is off in both columns. Andy should only have $400 for Grocery1. It looks like it's multiplying it instead of adding it. I tried to divide by 3, which helps some of the people with 3 entries, but that's probably not what I want.


Solution

  • I don't think you want a full join. You would if both tables could be joined together directly, but the fact that we need to to through the junction table changes the situation.

    We can start from the junction table, then union both grocery tables (while separating the original columns) and bring them with a left join on both possibles matches. The last step is aggregation:

    select j.name1, sum(t.amount1) as grocery1, sum(t.amount2) as grocery2
    from junction1 j
    left join (
        select id1, null as id2, amount as amount1, null as amount2 from test1
        union all select null, id2, null, amount from test2
    ) t on t.id1 = j.id1 or t.id2 = j.id2
    group by j.name1
    

    Using union avoids the "row multiplication" issue that you are seeing. On the other hand, the left join ensures that records from both tables are preserved, regardless of whether or not the same person actually shopped at both locations.