Search code examples
sqlhiveunionhiveqlfull-outer-join

Aggregating Amounts from Different Tables


I have a table t1 like this:

store_id    industry_id    cust_id    amount     gender     age
1           100            1000       1.00       M          20
2           100            1000       2.05       M          20
3           100            1000       3.15       M          20
4           200            2000       5.00       F          30
5           200            2000       6.00       F          30

And another table t2 that looks like this:

store_id    industry_id    cust_id    amount   
10          100            1000       10.00   
20          200            2000       11.00

Suppose we want to construct a table with all of the transactions for a given customer in each industry. In other words, something like this:

store_id.   industry_id.   cust_id.   amount
1           100            1000       1.00
2           100            1000       2.05
3           100            1000       3.15
4           200            2000       5.00
5           200            2000       6.00
10          100            1000       10.00
20          200            2000       11.00

I'm trying to do this by using a join and a coalesce statement in the query below, but it doesn't work because each row has an entry for the amount column in t1, i.e., there aren't any NULL values for the coalesce statement to use. What's the best way to do this using a join?

SELECT
a.store_id,
a.industry_id,
a.cust_id,
COALESCE(a.amount,b.amount,0) AS amount
FROM t1 a
LEFT JOIN (SELECT store_id AS store_id_2, industry_id AS industry_id_2, cust_id AS cust_id_2, amount FROM t2) b 
ON a.cust_id=b.cust_id_2 AND a.industry_id=b.industry_id_2;

This query results in:

store_id    industry_id    cust_id    amount     
1           100            1000       1.00  
2           100            1000       2.05  
3           100            1000       3.15  
4           200            2000       5.00 
5           200            2000       6.00 

Solution

  • For this dataset union all seems good enough:

    select store_id, industry_id, cust_id, amount from t1
    union all
    select store_id, industry_id, cust_id, amount from t2
    

    I am speculating that the same store / industry / customer tuple may appear in both tables and you want just one row in the result with the sum the corresponding amounts. If so, you might be interested in a full join:

    select
        coalesce(t1.store_id, t2.store_id) store_id,
        coalesce(t1.industry_id, t2.industry_id) industry_id,
        coalesce(t1.cust_id, t2.cust_id) cust_id,
        coalesce(t1.amount, 0) + coalesce(t2.amount, 0) amount
    from t1
    full join t2 
        on t2.store = t1.store and t2.industry = t1.industry and t2.cust_id = t1.cust_id