My company is using the Capacity compute pricing
model by purchasing slots for BigQuery. I'm calculating the cost of each query so that everyone on my team can review their query performance. I checked the INFORMATION_SCHEMA.JOBS
table using the total_slot_ms
column with the formula: ROUND((total_slot_ms / 3600000) * hour_slot_price, 2)
, but this only accounts for about a quarter of the daily cost. However, using ROUND((total_bytes_billed / POW(10, 12)) * scan_bytes_price, 2)
gives a number that is closer to the actual cost.
Do you have any idea? Should I continue using total_bytes_billed
to calculate the cost of each query even though we are on Capacity compute pricing? Many thanks!
SELECT
date(creation_time) as creation_date,
job_id,
user_email,
creation_time,
start_time,
end_time,
total_bytes_processed,
total_bytes_billed,
ROUND((total_bytes_billed / POW(10, 12)) * 6.25, 2) AS data_processed_cost_usd,
total_slot_ms,
ROUND((total_slot_ms / 3600000) * 0.0768, 2) AS slot_time_cost_usd,
SAFE_DIVIDE(total_slot_ms, TIMESTAMP_DIFF(end_time, start_time, MILLISECOND)) AS avg_slots,
query
FROM
`region-eu`.INFORMATION_SCHEMA.JOBS
WHERE 1 = 1
AND job_type = 'QUERY'
AND state = 'DONE'
AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) -- Last 7 days
ORDER BY
total_cost_usd DESC
I got an answer from the Google team. I hope it’s helpful for anyone researching a similar issue:
The
total_slot_ms
metric, available in theINFORMATION_SCHEMA.JOBS
table, is the most relevant measure of how many slots a query consumes and should be the primary factor in cost calculations. To determine the cost per query, use the formula:
Cost Per Query = (total_slot_ms/ 3600000)× hour_slot_price
This formula directly correlates the time slots are in use with your committed costs. If you find that the sum of these calculated costs does not match the total daily costs, it could be due to idle slot time or other system overheads not directly attributed to specific queries.
While
total_bytes_billed
is traditionally used in On-Demand pricing models, where costs are based on data processed, it can still serve as a supplementary metric under Capacity pricing. It helps assess query efficiency by comparing data processed against slot usage. A query that processes large amounts of data but consumes fewer slots might indicate efficient usage, even under Capacity pricing.However, the most critical aspect of managing costs under this pricing model is optimizing slot utilization. Unused slots still contribute to the overall costs, so maximizing their usage across queries is essential. Tools like BigQuery's reservation management can help in better allocating slots across teams or projects, ensuring that your committed resources are fully utilized.