Search code examples
sqlgoogle-bigqueryleft-joinpartitioning

Left Join on ROW_NUMBER() OVER (PARTITION


I have two tables, one with sales information and the other with shipping information. Both are at the item level for each order. In other words, each record, in both tables, will only have a quantity of 1 even if there is multiple of the item within the order.

Table 1

order_number product_code cost currency
100 aa $10 USD
100 aa $10 USD
101 bb $15 USD

Table 2

order_number product_code carrier_service shipment_cost
100 aa Carrier A $7
100 aa Carrier B $7
101 bb Carrier C $13

I need to join the tables in order to pull in aspects from each. The challenge is there is not a unique identifier for each record. I initially tried to join the two tables on order_number and product_code however this led to duplicated results (because the lack of uniqueness).

I then tried to add ROW_NUMBER() OVER (PARTITION... to assign a row number and tried to add that condition to the join but I am still failing. The values from the second CTE are not pulling through to the final results but if you were to run it separately, the values populate.

WITH ss AS (
    SELECT 
        ROW_NUMBER() OVER (PARTITION BY product_code) AS row_id,              
        order_number,                                               
        product_code,                               
        cost,
        currency,                                
    FROM sales  
    ORDER BY order_number, product_code, ROW_NUMBER() OVER (PARTITION BY product_code)),
sis AS (
     SELECT 
        ROW_NUMBER() OVER (PARTITION BY product_code) AS row_id,
        order_number,
        product_code,
        carrier_service,
        shipment_cost
      FROM items
      GROUP BY 1,2,3,4,5,6
      ORDER BY order_number, product_code, ROW_NUMBER() OVER (PARTITION BY product_code))
Select 
    ss.order_number,                    
    ss.product_code,                    
    ss.cost
    ss.currency,
    sis.carrier_service,
    sis.shipment_cost,
FROM ss  
LEFT JOIN sis 
 ON ss.order_number=sis.order_number AND ss.product_code=sis.product_code and ss.row_id=sis.row_id

ORDER BY ss.order_number, ss.product_code

Desired Result

order_number product_code cost currency carrier_service shipment_cost
100 aa $10 USD Carrier A $7
100 aa $10 USD Carrier B $7
101 bb $15 USD Carrier C $13

Actual Result

order_number product_code cost currency carrier_service shipment_cost
100 aa $10 USD Null Null
100 aa $10 USD Null Null
101 bb $15 USD Null Null

Am I unable to join on row numbers? Or is there a better approach to this problem?


Solution

  • consider below

    select order_number, product_code, cost, currency, carrier_service, shipment_cost
    from (select *, row_number() over(partition by product_code) rn from table1) t1
    left join (select *, row_number() over(partition by product_code) rn from table2) t2
    using(order_number, product_code, rn)
    order by order_number, product_code     
    

    if applied to sample data in your question - output is

    enter image description here

    Please note: even though I used over(partition by product_code) as it is in your code - I feel that it should rather be over(partition by order_number) - but it is not clear based on your particular question - so I am leaving it up to you while addressing the issue of left join that is the focus of your question