Search code examples
google-bigquerycost-management

Does BigQuery optimize SELECT * in subqueries or in queried virtual views?


BigQuery's documentation on cost optimization states that:

BigQuery can provide incredible performance because it stores data as a columnar data structure. This means SELECT * is the most expensive way to query data. This is because it will perform a full query scan across every column present in the table(s), including the ones you might not need.

However I could not find anywhere in the documentation that discuss whether the BigQuery engine optimizes subqueries/virtual views that use SELECT * or not. Examples:

(1)

CREATE VIEW my_view AS ( SELECT * FROM my_table );

SELECT a, b FROM my_view -- does this only access {a, b} or all columns?

(2)

SELECT a, b FROM (SELECT * FROM my_table) AS t -- does this only access {a, b} or all columns?

So the question is: Does BigQuery optimize subquery/view SELECT * to minimize costs, and if so, does it do it reliably?


Solution

  • Does BigQuery optimize subquery/view SELECT * to minimize costs,

    Yes. And you don't need to guess. Query below uses public dataset that you'll see same stats from your project.

    Full table query:

    This query will process 280.6 MB when run.

    enter image description here

    SubQuery (view will be the same)

    This query will process 12.6 MB when run.

    enter image description here

    and if so, does it do it reliably?

    Yes. This is guaranteed.