Search code examples
google-bigquerybigdatainner-joinjoin

BigQuery Update merge on table with multiple rows


I have the following:

Table A:

|uid|info|..
|123|null|..



Table B:
|uid|goodinfo|timestamp|
|123  |  3     |2019-12-12
|123  |  5     |2019-01-12
|234  |  11    |2019-10-12

When I try to run the update statement I always get the "UPDATE/MERGE must match at most one source row for each target row" error because in Table B I get multiple rows and I do not have any way of making the join more specific than this.

I have tried:

UPDATE `Table A` a
SET info = (select goodinfo from `Table B` where uid=123
ORDER BY lastmodifieddate DESC
LIMIT 1) b
WHERE 
a.info IS NULL AND
a.user_id=123

-- this approach works but because in the SubQuery I do not have access to Table A I cannot generalize it to something like:

SET info = (select goodinfo from `Table B` where uid=a.uid
ORDER BY lastmodifieddate DESC
LIMIT 1) b

-- this throws an error saying that he doesn't know whom "a.uid" is

Then I've tried using the Merge from BigQuery:

MERGE `Table A` a 
USING (
  select goodinfo,uid from `Table B` 
  ORDER BY lastmodifieddate DESC
  LIMIT 1
) b
ON a.uid = b.uid 
WHEN MATCHED and a.info is null and DATE(a.timestamp) = "2019-12-12" THEN
  UPDATE SET a.info = b.goodinfo

-- This query actually completes with success but no rows are modified for a reason I did not found yet

Then I've tried :

UPDATE `Table A` a 
SET a.info = b.goodinfo
FROM `Table B` b
WHERE a.uid = b.uid
and DATE(a.timestamp) = "2019-12-12"
and a.info IS NULL
//here I get the same error and I cannot filter the data from Table B and get the same error

Any thoughts on updating the data in a generic way and somehow filtering the data from Table B and get only the value "3" from goodinfo when joining?

I was also thinking of doing a :

WITH filtered_table_b(
  select uid, goodinfo from Table B
  ORDER BY lastmodifieddate DESC
  LIMIT 1
)

But this doesn't help, since I somehow need to select the last goodinfo based on timestamp for each user

Thanks


Solution

  • This is a standard SQL you can use:

    WITH data AS (
    select '123' as uid, 3 as goodinfo, DATE('2019-12-12') as timestamp union all
    select '123' as uid, 5 as goodinfo, DATE('2019-01-12') as timestamp union all
    select '234' as uid, 11 as goodinfo, DATE('2019-10-12') as timestamp 
    ),
    filterData AS (
    select uid, max(timestamp) maxTimestamp from data
    group by uid
    )
    
    select data.uid, goodinfo, filterData.maxTimestamp as  maxTimestamp 
    from data inner join filterData on data.uid = filterData.uid and data.timestamp = filterData.maxTimestamp
    

    This is the output of the above:

    enter image description here