Aim was to write a SQL query to return ride details from a public database.
While going through a data analytics course, I stumbled across this problem. the code used is correct as the instruction provided was the same however the result I obtained was different. The first row returned a blank row with just the number of trips mentioned.
This is the code I used
SELECT
usertype,
CONCAT (start_station_name," to ",end_station_name) AS Route,
Count(*) AS num_trips,
Round(Avg(CAST(tripduration as INT64)/60),2) AS duration,
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY
start_station_name, end_station_name, Usertype
ORDER BY
num_trips DESC
Limit 10
That happens because you actually have 5.8M rows with empty usertype, start_station_name and end_station_name and as you are sorting in DESC order they show as first line in the output.
Assuming that you only need to count when there is at least a start_station_name then you can add a WHERE clause
SELECT
usertype,
CONCAT (start_station_name," to ",end_station_name) AS Route,
Count(*) AS num_trips,
Round(Avg(CAST(tripduration as INT64)/60),2) AS duration,
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
WHERE start_station_name <> ''
GROUP BY
start_station_name, end_station_name, Usertype
ORDER BY
num_trips DESC
Limit 10