I have two pieces of code that should basically do the same thing, but their results differ, and I cannot understand why.
This is the first code:
SELECT
subquery.start_station_id,
subquery.avg_duration
FROM
(
SELECT
start_station_id,
AVG(tripduration) as avg_duration
FROM bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY start_station_id) as subquery
ORDER BY avg_duration DESC;
This is the second code:
SELECT
start_station_id,
AVG(tripduration) as avg_duration
FROM
bigquery-public-data.new_york.citibike_trips
GROUP BY
start_station_id
ORDER BY
avg_duration DESC;
Both of these codes are supposed to find the average trip duration of each station. As per my understanding, the first code is basically doing the same thing as the second code but using a subquery. Turns out I am wrong, because the results of both these pieces of code are different. What is it that I am not seeing here?
This is the result of the first code
This is the result from the second code
The first piece of code is the actual correct answer to the problem. I attempted to solve this problem without using subquery because it is a pretty straightforward problem. You have to find out the average trip duration by each station. As per my understanding, you just use the AVG function to find out the average trip duration and then group it by station ID. But both these codes have very different results. For the life of me, I can't understand why.
Any help is deeply appreciated, Thank you.
It seems in your SELECT statement you are not using the exact same table in both queries which may be the reason the result is not the same
FROM bigquery-public-data.new_york_citibike.citibike_trips
FROM bigquery-public-data.new_york.citibike_trips