Upgrading Kubernetes NGINX to use StackDriver new resource model in External Metrics

I have successfully set up NGINX as an ingress for my Kubernetes cluster on GKE. I have enabled and configured external metrics (and I am using an external metric in my HPA for auto-scaling). All good there and it's working well.

However, I have a deprecation warning in StackDriver around these external metrics. I have come to discover that these warnings are because of "old" resource types being used.

For example, using this command:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections" | jq

I get this output:

{
  "metricName": "custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections",
  "metricLabels": {
    "metric.labels.controller_class": "nginx",
    "metric.labels.controller_namespace": "ingress-nginx",
    "metric.labels.controller_pod": "nginx-ingress-controller-[snip]",
    "metric.labels.state": "writing",
    "resource.labels.cluster_name": "[snip]",
    "resource.labels.container_name": "",
    "resource.labels.instance_id": "[snip]",
    "resource.labels.namespace_id": "ingress-nginx",
    "resource.labels.pod_id": "nginx-ingress-controller-[snip]",
    "resource.labels.project_id": "[snip]",
    "resource.labels.zone": "[snip]",
    "resource.type": "gke_container"
  },
  "timestamp": "2020-01-26T05:17:33Z",
  "value": "1"
}

Note that the "resource.type" field is "gke_container". As of the next version of Kubernetes this needs to be "k8s_container".

I have looked through the Kubernetes NGINX configuration to try to determine when (or if) an upgrade has been made to support the new StackDriver resource model, but I have failed so far. And I would rather not "blindly" upgrade NGINX if I can help it (even in UAT).

These are the Docker images that I am currently using:

quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.2
gcr.io/google-containers/prometheus-to-sd:v0.9.0
gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.0

Could anyone help out here?

Thanks in advance, Ben

Solution

Ok this has nothing to do with NGINX and everything to do with Prometheus (and specifically the Prometheus sidecar prometheus-to-sd).

For future readers if your Prometheus start-up looks like this:

        - name: prometheus-to-sd
          image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
          ports:
            - name: profiler
              containerPort: 6060
          command:
            - /monitor
            - --stackdriver-prefix=custom.googleapis.com
            - --source=nginx-ingress-controller:http://localhost:10254/metrics
            - --pod-id=$(POD_NAME)
            - --namespace-id=$(POD_NAMESPACE)

Then is needs to look like this:

        - name: prometheus-to-sd
          image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
          ports:
            - name: profiler
              containerPort: 6060
          command:
            - /monitor
            - --stackdriver-prefix=custom.googleapis.com
            - --source=nginx-ingress-controller:http://localhost:10254/metrics
            - --monitored-resource-type-prefix=k8s_
            - --pod-id=$(POD_NAME)
            - --namespace-id=$(POD_NAMESPACE)

That is, include the --monitored-resource-type-prefix=k8s_ option.