Can a query in Bigquery be optimized to be faster to show up results but more expensive in computation as well? Often size and computation are correlated, just trying to think an edge case where those are decoupled.
The performance of the query depends on factors listed below :
- Number of bytes the query is going to read.
- Number of bytes query passes to the next slot.
- Amount of CPU required for running the query.
- Number of bytes query is going to write.
- Best practices of SQL are followed or not.
Size and computation are correlated and they cannot be decoupled. But to optimize query performance you can try the below steps :
- Use nested and repeated fields to denormalize data storage.
- Filter the data using Where clause and order it using Order By with Limit.
- Optimize the queries using SQL functions.
- Reduce data processed in queries by avoiding SELECT *, using WHERE clause, use prune partitioning, avoid using wildcards and avoid table sharding and try to reduce data before using JOIN.
- Optimize the query computation by avoiding repetitive data transformation using SQL queries, optimize JOIN patterns, use INT64 datatype, use prune partitioning, avoid multiple evaluations of the same Common Table Expressions (CTEs) and split complex queries into smaller ones.
- Avoid repetitive JOIN and Subquery.
- Avoid self joins and if the query processes keys that are heavily skewed to a few values, filter the data as early as possible.
All these practices can optimize query performance and computation. For more information, you can check this link.