We have exported our billing history to bigquery. I am trying to get the total cost per project but I'm starting to think its impossible to do so because the data includes project.labels
which means there can be multiple rows per billing item.
Here's a query I've just run:
SELECT project.labels.key,project.labels.value,service.description,usage_start_time,usage_end_time,project.id,sku.description,cost
FROM [our-billing-export]
WHERE service.id = "6F81-5844-456A"
and usage_start_time = "2018-04-06 19:25:01.510 UTC"
and usage_end_time = "2018-04-06 21:25:03.785 UTC"
and project.id = "dh-raia"
and sku.id = "D973-5D65-BAB2"
order by project.labels.key,project.labels.value,service.id, usage_start_time,usage_end_time,project.id,sku.id,cost
Notice that we have 3 identical costs for "Storage PD Capacity" which I think is OK, they probably represent 3 different persistent disks. Notice also though that the same 3 costs appear again, this time for a different project.labels.key
.
My aim is to get the total cost per project.id
. Clearly I cannot just issue:
select project.id,sum(cost)
from [our-billing-export]
group by project.id
because some costs would be included more than once (because they appear for multiple project.labels.key
s).
I cannot filter on a single project.labels.key
because we do not guarantee that each project has the same labels.
I cannot try to eliminate the duplication caused by labels like so:
SELECT service.id,sku.id,usage_start_time,usage_end_time, project.id,cost
FROM [our-billing-export]
GROUP BY service.id,sku.id ,usage_start_time,usage_end_time,project.id,cost
because that would eliminate the three valid line items that each have the same cost.
I cannot use the OVER() clause to obtain a single label per client like so:
SELECT project.labels.key,service.id,usage_start_time,usage_end_time,project.id,sku.id,rownum
FROM (
SELECT project.labels.key,service.id,usage_start_time,usage_end_time,project.id,sku.id,
ROW_NUMBER() OVER (PARTITION BY project.id,service.id,usage_start_time,usage_end_time,sku.id,project.labels.key) as rownum
FROM [our-billing-export]
)q
WHERE rownum=1
because when I do so I get error Repeated field 'project.labels.key' as PARTITION BY key is not allowed.
So, as far as I can tell there is no way to get a definitive answer to the question "How much have I spent on each project?" I'm hoping someone can tell me I'm wrong and that there IS a way to accomplish this.
OK, I've managed to resolve this (with some help from a colleague)
SELECT service.description
, sku.description
, project.name
, labels
, cost
FROM (
SELECT service.description
, sku.description
, project.name
, group_concat(project.labels.key + ':' + project.labels.value) WITHIN RECORD AS labels
, cost
FROM [our-billing-export]
WHERE usage_start_time = "2018-04-06 19:25:01.510 UTC"
AND usage_end_time = "2018-04-06 21:25:03.785 UTC"
AND project.id = 'dh-raia' AND cost > 0
AND sku.id = "D973-5D65-BAB2"
)
which can then be aggregated.