I am using Standard SQL in Google BigQuery. I'm trying to list the number of tlc yellow trips from Greenwich Village North to Greenwich Village South (These are the zone_id's in taxi_zone_geom table) in December 2018 and total amount. The zone_id's correspond with the pickup_location_id and dropoff_location_id in the tlc_yellow_trips_2018 table.
taxi_zone_geom table:
Row zone_id zone_name borough zone_geom
1 1 Newark Airport EWR POLY
2 3 Allerton/Pelham Gardens Bronx POLYGON((-73.8...
trips_2018 table
Row vendor_id pickup_datetime dropoff_datetime passenger_count trip_distance rate_code store_and_fwd_flag payment_type fare_amount extra mta_tax tip_amount tolls_amount imp_surcharge total_amount pickup_location_id dropoff_location_id
1 2 2018-05-06T21:13:05 2018-05-06T21:35:33 1 9.83 1 N 1 29 0.5 0.5 6.06 0 0.3 36.36 138 65
2 1 2018-05-06T21:59:42 2018-05-06T22:25:17 1 10.7 1 N 1 32 0.5 0.5 6.65 0 0.3 39.95 138 25
3 1 2018-05-06T23:54:05 2018-05-07T00:19:30 2 9.6 1 N 1 30.5 0.5 0.5 6.35 0 0.3 38.15 114 116
This works fine:
SELECT EXTRACT(MONTH FROM pickup_datetime) AS Month,
COUNT(*) AS Dec_trips_GVNorth_to_GVSouth,
SUM(total_amount) as total,
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018` AS t
LEFT JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` AS g ON t.pickup_location_id=g.zone_id WHERE g.zone_name='Greenwich Village North'
GROUP BY Month HAVING Month=12
However, I need to link to the drop_off by joining the same table, which I try to do using a secondary join as below:
LEFT JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` AS g2 ON t.dropoff_location_id=g2.zone_id WHERE g2.zone_name='Greenwich Village South'
This results in an error "Syntax error: Expected end of input but got keyword LEFT at [7:1]"
What am I doing wrong? Thanks
I understand this as two joins, with filtering in the WHERE
clause:
SELECT
COUNT(*) AS Dec_trips_GVNorth_to_GVSouth,
SUM(total_amount) as total
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018` AS t
INNER JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` AS g1
ON t.pickup_location_id = g1.zone_id
INNER JOIN `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom` AS g2
ON t.dropoff_location_id = g2.zone_id
WHERE
g1.zone_name = 'Greenwich Village North'
AND g2.zone_name = 'Greenwich Village South'
AND pickup_datetime >= date '2018-12-01'
AND pickup_datetime >= date '2019-01-01'
Notes:
Since you want to actually filter on the pickup / dropoff locations, use INNER JOIN
s, not LEFT JOIN
s
I don't see the need for an explicit GROUP BY
clause, nor for a HAVING
clause: you can filter on the target month directly in the WHERE
clause, using half open intervals and literal dates
If the table has data for 2018 only, as its name suggests, then the filter condition on the upper date range is superfluous