Search code examples
datetimejoingoogle-bigquery

BigQuery join on closest date match


I'm trying to join two tables in BigQuery based on an id and the closest date match.

Transaction Table:

transactionId dateTime productId
4a50665e 2022-05-13T14:12:55 abc
7d5889cd 2022-05-22T16:10:21 abc

Product Log Table (log of when each productId is updated to a new version):

dateTime productId version
2022-05-19T06:37:24 abc v2
2022-05-12T04:38:23 xyz v1
2022-05-10T09:57:54 abc v1

I want to add a version column to the transaction table by looking up from the product log table, based on productId match and the dateTime match from the product table. To get the active product version at the time of the transaction.

Desired Result:

transactionId dateTime productId version
4a50665e 2022-05-13T14:12:55 abc v1
7d5889cd 2022-05-22T16:10:21 abc v2

Something like this

SELECT
  t.*,
  p.version
FROM
  transaction_table t
LEFT JOIN
  product_log_table p
ON
  t.productId = p.productId AND 
  t.datetime < p.dateTime

But that doesn't work. I've tried searching a lot and tried a number of solutions but can't get anything to work. Should be simple? How do I do this?

Thanks for any help.


Solution

  • Consider below approach

    select any_value(t).*, 
      string_agg(version, '' order by p.dateTime desc limit 1) as version
    from Transaction_Table t
    join Product_Log_Table p
    on t.productId = p.productId
    and t.dateTime >= p.dateTime 
    group by format('%t', t)                  
    

    if applied to sample data in your question - output is

    enter image description here