Search code examples
sqldatetimegoogle-bigquerygreatest-n-per-groupdate-arithmetic

Select Min Datediff from joined tables


I'm currently working in BigQuery. I have two tables:

Table A {
  id int 
  b_id int
  date timestamp
  }
  
Table B {
  id int 
  long float
  lat float
  date timestamp
}

For one b_id in A there are multiple entries in B. I want to efficiently join both tables and only use the entry in B where A.b_id = B.id and the difference between the timestamps is minimal.

My current solution is:

SELECT A.*, B.*, ABS(DATETIME_DIFF(DATETIME(A.date), DATETIME(B.date), SECOND)) as timeDiff
FROM A
INNER JOIN B
ON A.b_id = B.id    
WHERE ABS(DATETIME_DIFF(DATETIME(A.date), DATETIME(B.date), SECOND)) = 
(
SELECT MIN(ABS(DATETIME_DIFF(DATETIME(B2.date), DATETIME(A2.date), SECOND)))
FROM B as B2 
INNER JOIN A as A2
ON B2.driverId = A2.driverId
GROUP BY A2.driverId, B2.rideId
)

Problem is that it takes way to much time for a single entry. Any suggestions on how to improve the query?


Solution

  • You could use window functions:

    select *
    from (
        select a.*, b.*, 
            row_number() over(
                partition by a.b_id
                order by abs(datetime_diff(datetime(a.date), datetime(b.date), second))
            ) rn
        from a
        inner join b on a.b_id = b.id  
    ) t
    where rn = 1