I am using BigQuery to get GCP usage data using queries. I have a single query to get all details from GCP exported Bill data to BigQuery.
select sum(cost) as cost, sum(credits.amount) as credit FROM `TABLENAME`
LEFT JOIN UNNEST(credits) AS credits
WHERE invoice.month="202404"
AND
select sum(cost) as cost FROM `TABLENAME`
WHERE invoice.month="202404"
in both queries, I am getting different cost values. Why I am getting like this?
Above is a sample query.
Below one is my actual query. Here I am aggregating hourly data as daily data by filtering zero cost and credit amounts. in this scenario, I am getting multiple duplicate rows in result. It's Bigquery cost export data.
SELECT
billing_account_id,
service.id AS service_id,
service.description AS service_des,
sku.id AS sku_id,
sku.description AS sku_des,
FORMAT_DATETIME('%Y-%m-%d', usage_start_time) AS usage_date,
location.location AS location,
location.region AS region,
location.country AS country,
location.zone AS zone,
invoice.month AS invoice_mon,
currency,
AVG(currency_conversion_rate) as currency_conversion_rate,
cost_type,
resource.name AS resource_name,
resource.global_name AS res_global_name,
SUM(usage.amount_in_pricing_units) AS usage_amount_pricing_units,
usage.pricing_unit AS pricing_unit,
SUM(cost) AS cost,
SUM(cost_at_list) AS cost_at_list,
transaction_type,
seller_name,
adjustment_info.id AS adjustment_info_id,
adjustment_info.description AS adjustment_des,
adjustment_info.type AS adjustment_type,
adjustment_info.mode AS adjustment_mode,
SUM(price.effective_price) AS effective_price,
SUM(price.pricing_unit_quantity) AS pricing_unit_quantity,
project.id AS project_id,
project.number AS project_number,
project.name AS project_name,
SUM(IFNULL(credits.amount, 0)) AS credit_amount,
credits.type AS credit_type,
credits.id AS credit_id,
credits.name as credit_name,
credits.full_name AS credit_full_name,
TO_JSON_STRING(labels) AS labels,
TO_JSON_STRING(system_labels) AS system_labels
FROM `TABLENAME`
LEFT JOIN UNNEST(credits) AS credits
WHERE (cost!=0 OR credits.amount!=0) and invoice.month="202404"
GROUP BY
billing_account_id,
service_id,
service_des,
sku_id,
sku_des,
usage_date,
location,
region,
country,
zone,
invoice_mon,
currency,
cost_type,
resource_name,
res_global_name,
usage.pricing_unit,
transaction_type,
seller_name,
adjustment_info_id,
adjustment_des,
adjustment_type,
adjustment_mode,
project_id,
project_number,
project_name,
credit_type,
credit_full_name,
credit_id,
credit_name,
labels,
system_labels
HAVING
(cost!=0 OR credit_amount!=0)
ORDER BY
usage_date
Edit:::
SELECT
billing_account_id,
service.id AS service_id,
service.description AS service_des,
sku.id AS sku_id,
sku.description AS sku_des,
FORMAT_DATETIME('%Y-%m-%d', usage_start_time) AS usage_date,
location.location AS location,
location.region AS region,
location.country AS country,
location.zone AS zone,
invoice.month AS invoice_mon,
currency,
AVG(currency_conversion_rate) as currency_conversion_rate,
cost_type,
resource.name AS resource_name,
resource.global_name AS res_global_name,
SUM(usage.amount_in_pricing_units) AS usage_amount_pricing_units,
usage.pricing_unit AS pricing_unit,
SUM(IF(IFNULL(credits_offset, 0) = 0, cost, 0)) AS cost,
SUM(cost_at_list) AS cost_at_list,
transaction_type,
seller_name,
adjustment_info.id AS adjustment_info_id,
adjustment_info.description AS adjustment_des,
adjustment_info.type AS adjustment_type,
adjustment_info.mode AS adjustment_mode,
SUM(price.effective_price) AS effective_price,
SUM(price.pricing_unit_quantity) AS pricing_unit_quantity,
project.id AS project_id,
project.number AS project_number,
project.name AS project_name,
SUM(IFNULL(credits.amount, 0)) AS credit_amount,
credits.type AS credit_type,
credits.id AS credit_id,
credits.name as credit_name,
credits.full_name AS credit_full_name,
TO_JSON_STRING(labels) AS labels,
TO_JSON_STRING(system_labels) AS system_labels
FROM `TABLENAME`
LEFT JOIN UNNEST(credits) credits WITH OFFSET AS credits_offset
WHERE (cost!=0 OR credits.amount!=0)
GROUP BY
billing_account_id,
service_id,
service_des,
sku_id,
sku_des,
usage_date,
location,
region,
country,
zone,
invoice_mon,
currency,
cost_type,
resource_name,
res_global_name,
usage.pricing_unit,
transaction_type,
seller_name,
adjustment_info_id,
adjustment_des,
adjustment_type,
adjustment_mode,
project_id,
project_number,
project_name,
credit_type,
credit_full_name,
credit_id,
credit_name,
labels,
system_labels
HAVING
(cost!=0 OR credit_amount!=0)
ORDER BY
usage_date
it's because of your LEFT JOIN
creating a 1:N relationship, thus duplicating the data in cost
column.
Your table essentially looks like this:
cost | credits |
---|---|
123 | ARRAY[{amount: 1}, {amount: 5}] |
345 | ARRAY[{amount: 3}] |
and the cost before unnesting is 468.
When you unnest the array, you're left with flat table, which looks like this
cost | credits |
---|---|
123 | 1 |
123 | 5 |
345 | 3 |
and when you sum both, you get inflated numbers for cost, because of the duplication.
I suggest unnesting the credits
array with offset and only summing the cost
with offset 0, eg. something like this:
WITH cost_data AS (
SELECT
'compute' AS sku
, 123 AS cost
, [STRUCT (1 AS amount), STRUCT (5 AS amount)] AS credits
UNION ALL
SELECT
'storage' AS sku
, 456 AS cost
, [STRUCT (3 AS amount)]
)
SELECT sku, SUM(IF(credits_offset = 0, cost, 0)) AS cost, SUM(unn.amount) AS credit_amount
FROM cost_data
LEFT JOIN UNNEST(credits) unn WITH OFFSET AS credits_offset
GROUP BY sku;
EDIT: As pointed out in the below comment, this would only return rows where any credits were used, since I was filtering based on the offset and offset is present only if an unnested array is. In that case, just use SUM(IF(IFNULL(credits_offset, 0) = 0, cost, 0))
instead of SUM(IF(credits_offset = 0, cost, 0))
, to include all rows.