Search code examples
google-cloud-billing

Is it possible to get an unambiguous cost summary per project?


We have exported our billing history to bigquery. I am trying to get the total cost per project but I'm starting to think its impossible to do so because the data includes project.labels which means there can be multiple rows per billing item.

Here's a query I've just run:

SELECT   project.labels.key,project.labels.value,service.description,usage_start_time,usage_end_time,project.id,sku.description,cost 
FROM     [our-billing-export] 
WHERE    service.id = "6F81-5844-456A"
  and    usage_start_time = "2018-04-06 19:25:01.510 UTC"
  and    usage_end_time = "2018-04-06 21:25:03.785 UTC"
  and    project.id = "dh-raia"
  and    sku.id = "D973-5D65-BAB2"
order by project.labels.key,project.labels.value,service.id, usage_start_time,usage_end_time,project.id,sku.id,cost

which returns this: enter image description here

Notice that we have 3 identical costs for "Storage PD Capacity" which I think is OK, they probably represent 3 different persistent disks. Notice also though that the same 3 costs appear again, this time for a different project.labels.key.

My aim is to get the total cost per project.id. Clearly I cannot just issue:

select project.id,sum(cost)
from [our-billing-export]
group by project.id

because some costs would be included more than once (because they appear for multiple project.labels.keys).

I cannot filter on a single project.labels.key because we do not guarantee that each project has the same labels.

I cannot try to eliminate the duplication caused by labels like so:

SELECT   service.id,sku.id,usage_start_time,usage_end_time, project.id,cost
FROM     [our-billing-export]
GROUP BY  service.id,sku.id ,usage_start_time,usage_end_time,project.id,cost

because that would eliminate the three valid line items that each have the same cost.

I cannot use the OVER() clause to obtain a single label per client like so:

SELECT   project.labels.key,service.id,usage_start_time,usage_end_time,project.id,sku.id,rownum
FROM     (
         SELECT   project.labels.key,service.id,usage_start_time,usage_end_time,project.id,sku.id,
                  ROW_NUMBER() OVER (PARTITION BY project.id,service.id,usage_start_time,usage_end_time,sku.id,project.labels.key) as rownum
         FROM     [our-billing-export]
         )q
WHERE rownum=1

because when I do so I get error Repeated field 'project.labels.key' as PARTITION BY key is not allowed.

So, as far as I can tell there is no way to get a definitive answer to the question "How much have I spent on each project?" I'm hoping someone can tell me I'm wrong and that there IS a way to accomplish this.


Solution

  • OK, I've managed to resolve this (with some help from a colleague)

    SELECT service.description
    ,      sku.description
    ,      project.name
    ,      labels
    ,      cost
    FROM (
        SELECT  service.description
        ,       sku.description
        ,       project.name
        ,       group_concat(project.labels.key + ':' + project.labels.value) WITHIN RECORD AS labels
        ,       cost
        FROM [our-billing-export] 
        WHERE usage_start_time = "2018-04-06 19:25:01.510 UTC" 
          AND usage_end_time = "2018-04-06 21:25:03.785 UTC" 
          AND project.id = 'dh-raia' AND cost > 0
          AND sku.id = "D973-5D65-BAB2"
      )
    

    Returns the correct costs enter image description here

    which can then be aggregated.