Search code examples
google-cloud-platformgoogle-bigquery

Calculate query cost in BigQuery using Capcity compute pricing model


My company is using the Capacity compute pricing model by purchasing slots for BigQuery. I'm calculating the cost of each query so that everyone on my team can review their query performance. I checked the INFORMATION_SCHEMA.JOBS table using the total_slot_ms column with the formula: ROUND((total_slot_ms / 3600000) * hour_slot_price, 2), but this only accounts for about a quarter of the daily cost. However, using ROUND((total_bytes_billed / POW(10, 12)) * scan_bytes_price, 2) gives a number that is closer to the actual cost.

Do you have any idea? Should I continue using total_bytes_billed to calculate the cost of each query even though we are on Capacity compute pricing? Many thanks!

SELECT
  date(creation_time) as creation_date,
  job_id,
  user_email,
  creation_time,
  start_time,
  end_time,
  total_bytes_processed,
  total_bytes_billed,
  ROUND((total_bytes_billed / POW(10, 12)) * 6.25, 2) AS data_processed_cost_usd,
  total_slot_ms,
  ROUND((total_slot_ms / 3600000) * 0.0768, 2) AS slot_time_cost_usd,  
  SAFE_DIVIDE(total_slot_ms, TIMESTAMP_DIFF(end_time, start_time, MILLISECOND)) AS avg_slots,
  query
FROM
  `region-eu`.INFORMATION_SCHEMA.JOBS
WHERE 1 = 1
  AND job_type = 'QUERY'
  AND state = 'DONE'
  AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)  -- Last 7 days
ORDER BY
  total_cost_usd DESC

Solution

  • I got an answer from the Google team. I hope it’s helpful for anyone researching a similar issue:

    The total_slot_ms metric, available in the INFORMATION_SCHEMA.JOBS table, is the most relevant measure of how many slots a query consumes and should be the primary factor in cost calculations. To determine the cost per query, use the formula:

    Cost Per Query = (total_slot_ms/ 3600000)× hour_slot_price

    This formula directly correlates the time slots are in use with your committed costs. If you find that the sum of these calculated costs does not match the total daily costs, it could be due to idle slot time or other system overheads not directly attributed to specific queries.

    While total_bytes_billed is traditionally used in On-Demand pricing models, where costs are based on data processed, it can still serve as a supplementary metric under Capacity pricing. It helps assess query efficiency by comparing data processed against slot usage. A query that processes large amounts of data but consumes fewer slots might indicate efficient usage, even under Capacity pricing.

    However, the most critical aspect of managing costs under this pricing model is optimizing slot utilization. Unused slots still contribute to the overall costs, so maximizing their usage across queries is essential. Tools like BigQuery's reservation management can help in better allocating slots across teams or projects, ensuring that your committed resources are fully utilized.