Search code examples
sqlcountgoogle-bigquerysubquery

Big Query match records between tables


I have two tables:

TableA

ID          Gender               BeginDate      
034446         F          2016-01-15T00:00:00
034446         F          2020-02-17T00:00:00
035689         F          2016-01-14T00:00:00
035679         F          2016-01-18T00:00:00
045687         F          2020-05-21T00:00:00

TableB

ID            Gender        Date        
34446         F          2016-01-14
35689         F          2016-01-14
35679         F          2016-01-18

I'm trying to figure out how many records(and which ones)from TableB do not match up with Table A as well as how many do. Both tables have duplicate ID's which is why I need to also use the Date field to match up records. The date field from TableB might be a day or two off between tables. For example, the first row of Table B should match with the first row of Table A, not the second row. There needs to be a statement saying TableA date field is either equal to or between a day or two from the date field of TableB. I have attempted to write a query below but have the dates set equal to eachother.

SELECT a.ID, CONCAT('0',CAST(b.ID AS STRING)), EXTRACT(DATE FROM a.BeginDate) AS date
FROM `dev.tableA` a
LEFT OUTER JOIN `dev.TableB` b
ON a.ID = CONCAT('0',CAST(b.ID AS STRING))
AND EXTRACT(DATE FROM a.BeginDate) = b.Date

Solution

  • If I follow you correctly, you can use exists:

    select b.*
    from tableb b
    where not exists (
        select 1
        from tablea a 
        where a.id = b.id
            and a.begindate >= date_sub(b.date, interval 2 day)
            and a.begindate <= date_add(b.date, interval 2 day)
    )
    

    This brings records of b for which no match exists in a with the same id and a date that falls within +/- two days. You can add the casting on id, if that's really necessary, and adjust the boundaries as needed.

    If you want an overall record count, you can just replace b.* with count(*).