Search code examples
sqlsubquerysnowflake-cloud-data-platformleft-join

Snowflake-SQL: Summing values prior to Left Join without Summarizing/Grouping Data


I am trying to Left Join Table A with Table B on ID # to compare Costs vs Payments. However, in this particular instance, I do not want to group or summarize the data. I need all rows (and line level detail) from Table A without duplicating. Table B has some duplicate IDs, so I want to pull in the sum of the Payments from Table B for each row in Table A.

First I tried a simple Left Join, but this created duplicate rows in Table A.

Select 
A.ID, A.Part_Description, A.Part_Cost, B.Total_Payment 
From
TableA as A
Left Join TableB as B
on A.ID = B.ID

Which gave me this:

A.ID A.Part_Description A.Part_Cost B.Total_Payment
ID_1 Part A 3 1.2
ID_1 Part B 2 1.2
ID_1 Part A 3 3.8
ID_1 Part B 2 3.8

I've looked into Partitioning with Row_Number functions as well, but that will give me either the most recent or largest Payment from Table B. Grouping the data removes details from Table A that I want to keep.

Since B.ID shows up several times in Table B with different B.Payment values, I need to get the sum of B.Payments prior to the join, preventing duplicate rows from TableA in the results.

Expected Result

A.ID A.Part_Description A.Part_Cost B.Total_Payment
ID_1 Part A 3 5
ID_1 Part B 2 5
ID_2 Part A 3 12
ID_2 Part B 2 12
ID_2 Part C 7 12

I would like to use the results to compare the Costs vs Payments, so occasionally, I expect B.Total_Cost to be different than sum(A.Part_Cost) for a specific ID.

Any ideas? I'm pretty new to SQL and joining tables... I suspect that I may need to use a subquery, but I'm struggling a bit.


Solution

  • I believe you want only a simple GROUP BY

    Select 
    A.ID, A.Part_Description, MAX(A.Part_Cost), SUM(B.Total_Payment) as Total_Payment
    From
    TableA as A
    Left Join TableB as B
    on A.ID = B.ID
    GROUP BY A.ID, A.Part_Description