google-cloud-platform google-cloud-stackdriver

Is there a good explanation of resource labels versus metric labels?

When writing custom metrics you have resource labels and metric labels.

Resource labels are labels that are required by the resource. So if I choose the generic_task resource type and exclude the task_id resource label I get an error.

Metric labels seem to be arbitrary additional labels?

I can aggregate metrics by both resource and metric labels.

What is the point of having both? Is it merely to enforce consistency? So I can say generic_task and know for sure all metrics of that type have specific labels? And then if I still need more labels I use metric labels?

Is it a best practice to choose the best resource type and avoid metric labels were possible (for consistency)?

It seems to me like I should just always use the global type because it requires the least amount of labels which means I'm not spending money on dimensions I don't need.

Is there a cost difference or other operational difference between resource and metric labels?

I'm having trouble finding clear best practices in the google docs.

Solution

The documentation is quite comprehensive on this topic. See:

A word about Labels; and
Labels

The example used is helpful:

resource labels disambiguate|uniquely identify a resource
metric labels disambiguate|uniquely identify a metric

E.g. Pods running on Kubernetes Engine making HTTP requests would permit identifying the resource (Kubernetes cluster, node, pod, container etc.) and identifying the metric (HTTP method, HTTP response code etc.).

The same metric bound to a VM would have a different set of resource labels but could be aggregated with the same metric on the Kubernetes cluster due to the shared labels.

You are prudent to be concerned about metric cardinality. IIRC many of the standard|internal metrics aren't billed. See View and manage metric usage for ways to consider and reduce the costs for those that are billed.