Search code examples
sqlgoogle-bigquerysubquerydata-analysis

Unexpected result SQL subquery


I have two pieces of code that should basically do the same thing, but their results differ, and I cannot understand why.

This is the first code:

SELECT
    subquery.start_station_id,
    subquery.avg_duration
FROM
    (
    SELECT
        start_station_id,
        AVG(tripduration) as avg_duration
FROM bigquery-public-data.new_york_citibike.citibike_trips
GROUP BY start_station_id) as subquery
ORDER BY avg_duration DESC;

This is the second code:

SELECT
  start_station_id,
  AVG(tripduration) as avg_duration
FROM
  bigquery-public-data.new_york.citibike_trips
GROUP BY
  start_station_id
ORDER BY
  avg_duration DESC;

Both of these codes are supposed to find the average trip duration of each station. As per my understanding, the first code is basically doing the same thing as the second code but using a subquery. Turns out I am wrong, because the results of both these pieces of code are different. What is it that I am not seeing here?

This is the table being used

This is the result of the first code

This is the result from the second code

The first piece of code is the actual correct answer to the problem. I attempted to solve this problem without using subquery because it is a pretty straightforward problem. You have to find out the average trip duration by each station. As per my understanding, you just use the AVG function to find out the average trip duration and then group it by station ID. But both these codes have very different results. For the life of me, I can't understand why.

Any help is deeply appreciated, Thank you.


Solution

  • It seems in your SELECT statement you are not using the exact same table in both queries which may be the reason the result is not the same

    FROM bigquery-public-data.new_york_citibike.citibike_trips

    FROM bigquery-public-data.new_york.citibike_trips