Search code examples
sqlcountgoogle-bigquerysuminner-join

Performing 2 JOINs on same table


I am using Standard SQL in Google BigQuery. I'm trying to list the number of tlc yellow trips from Greenwich Village North to Greenwich Village South (These are the zone_id's in taxi_zone_geom table) in December 2018 and total amount. The zone_id's correspond with the pickup_location_id and dropoff_location_id in the tlc_yellow_trips_2018 table.

taxi_zone_geom table:

Row zone_id zone_name   borough zone_geom
1   1   Newark Airport  EWR POLY
2   3   Allerton/Pelham Gardens Bronx   POLYGON((-73.8...

trips_2018 table

Row vendor_id   pickup_datetime dropoff_datetime    passenger_count trip_distance   rate_code   store_and_fwd_flag  payment_type    fare_amount extra   mta_tax tip_amount  tolls_amount    imp_surcharge   total_amount    pickup_location_id  dropoff_location_id 
1   2   2018-05-06T21:13:05 2018-05-06T21:35:33 1   9.83    1   N   1   29  0.5 0.5 6.06    0   0.3 36.36   138 65  
2   1   2018-05-06T21:59:42 2018-05-06T22:25:17 1   10.7    1   N   1   32  0.5 0.5 6.65    0   0.3 39.95   138 25  
3   1   2018-05-06T23:54:05 2018-05-07T00:19:30 2   9.6 1   N   1   30.5    0.5 0.5 6.35    0   0.3 38.15   114 116 

This works fine:

SELECT EXTRACT(MONTH FROM pickup_datetime) AS Month,
COUNT(*) AS Dec_trips_GVNorth_to_GVSouth,
SUM(total_amount) as total,
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018` AS t 
LEFT JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` AS g ON t.pickup_location_id=g.zone_id WHERE g.zone_name='Greenwich Village North'
GROUP BY Month HAVING Month=12

However, I need to link to the drop_off by joining the same table, which I try to do using a secondary join as below:

LEFT JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` AS g2 ON t.dropoff_location_id=g2.zone_id WHERE g2.zone_name='Greenwich Village South' 

This results in an error "Syntax error: Expected end of input but got keyword LEFT at [7:1]"

What am I doing wrong? Thanks


Solution

  • I understand this as two joins, with filtering in the WHERE clause:

    SELECT 
        COUNT(*) AS Dec_trips_GVNorth_to_GVSouth,
        SUM(total_amount) as total
    FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018` AS t 
    INNER JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` AS g1 
        ON t.pickup_location_id = g1.zone_id 
    INNER JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` AS g2
        ON t.dropoff_location_id = g2.zone_id 
    WHERE 
        g1.zone_name = 'Greenwich Village North'
        AND g2.zone_name = 'Greenwich Village South'
        AND pickup_datetime >= date '2018-12-01'
        AND pickup_datetime >= date '2019-01-01'
    

    Notes:

    • Since you want to actually filter on the pickup / dropoff locations, use INNER JOINs, not LEFT JOINs

    • I don't see the need for an explicit GROUP BY clause, nor for a HAVING clause: you can filter on the target month directly in the WHERE clause, using half open intervals and literal dates

    • If the table has data for 2018 only, as its name suggests, then the filter condition on the upper date range is superfluous