Search code examples
sqlgroup-bygoogle-bigqueryorder-of-execution

Does the GROUP BY occur after the operations that occur in the SELECT statement? (BigQuery)


From what I have read on e.g. https://www.sisense.com/blog/sql-query-order-of-operations/ I understand that SQL statements will execute in the order:

1: FROM, 2: WHERE, 3: GROUP BY, 4: HAVINIG, 5: SELECT, 6: ORDER BY, 7: LIMIT

This makes sense to me, except for the fact that you can GROUP BY new columns created in the SELECT statement. e.g.

WITH data_sample AS (
   SELECT "m1" as meter, 2 as value UNION ALL 
   SELECT "m1" as meter, 3 as value, UNION ALL 
   SELECT "m1" as meter, 5 as value, UNION ALL
   SELECT "m1" as meter, 5 as value UNION ALL
   SELECT "m2" as meter, 4 as value UNION ALL
   SELECT "m2" as meter, 6 as value UNION ALL 
   SELECT "m3" as meter, 4 as value UNION ALL
   SELECT "m3" as meter, 4 as value )


SELECT IF(meter="m1", meter, "m7") as meter_new, SUM(value)
FROM data_sample
GROUP BY meter_new

Here the new column, meter_new that is created in the SELECT statement, is already available to the GROUP BY (which makes me think that SELECT executes before the GROUP BY) but this grouping is then used for the SUM statement which is in the SELECT part of the query making it seem that the SELECT is happening after the GROUP BY as well.

Clearly I am missing something here, but googling just keeps returning the same ordered list of operations which is not helping me.


Solution

  • The execution order of BigQuery is totally different from general RDBMS.

    BigQuery uses the Dremel query engine to execute queries.Dremel uses a multi-level serving tree to execute queries . A root server receives incoming queries, reads metadata from the tables, and routes the queries to the next level in the serving tree. You can read more about how BigQuery works from this link1 and link2.

    You can check the execution order of your query using Query Plan Explanation.

    enter image description here