Search code examples
nginxkubernetesstackdriver

Upgrading Kubernetes NGINX to use StackDriver new resource model in External Metrics


I have successfully set up NGINX as an ingress for my Kubernetes cluster on GKE. I have enabled and configured external metrics (and I am using an external metric in my HPA for auto-scaling). All good there and it's working well.

However, I have a deprecation warning in StackDriver around these external metrics. I have come to discover that these warnings are because of "old" resource types being used.

For example, using this command:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections" | jq

I get this output:

{
  "metricName": "custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections",
  "metricLabels": {
    "metric.labels.controller_class": "nginx",
    "metric.labels.controller_namespace": "ingress-nginx",
    "metric.labels.controller_pod": "nginx-ingress-controller-[snip]",
    "metric.labels.state": "writing",
    "resource.labels.cluster_name": "[snip]",
    "resource.labels.container_name": "",
    "resource.labels.instance_id": "[snip]",
    "resource.labels.namespace_id": "ingress-nginx",
    "resource.labels.pod_id": "nginx-ingress-controller-[snip]",
    "resource.labels.project_id": "[snip]",
    "resource.labels.zone": "[snip]",
    "resource.type": "gke_container"
  },
  "timestamp": "2020-01-26T05:17:33Z",
  "value": "1"
}

Note that the "resource.type" field is "gke_container". As of the next version of Kubernetes this needs to be "k8s_container".

I have looked through the Kubernetes NGINX configuration to try to determine when (or if) an upgrade has been made to support the new StackDriver resource model, but I have failed so far. And I would rather not "blindly" upgrade NGINX if I can help it (even in UAT).

These are the Docker images that I am currently using:

quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.26.2
gcr.io/google-containers/prometheus-to-sd:v0.9.0
gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.0

Could anyone help out here?

Thanks in advance, Ben


Solution

  • Ok this has nothing to do with NGINX and everything to do with Prometheus (and specifically the Prometheus sidecar prometheus-to-sd).

    For future readers if your Prometheus start-up looks like this:

            - name: prometheus-to-sd
              image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
              ports:
                - name: profiler
                  containerPort: 6060
              command:
                - /monitor
                - --stackdriver-prefix=custom.googleapis.com
                - --source=nginx-ingress-controller:http://localhost:10254/metrics
                - --pod-id=$(POD_NAME)
                - --namespace-id=$(POD_NAMESPACE)
    

    Then is needs to look like this:

            - name: prometheus-to-sd
              image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
              ports:
                - name: profiler
                  containerPort: 6060
              command:
                - /monitor
                - --stackdriver-prefix=custom.googleapis.com
                - --source=nginx-ingress-controller:http://localhost:10254/metrics
                - --monitored-resource-type-prefix=k8s_
                - --pod-id=$(POD_NAME)
                - --namespace-id=$(POD_NAMESPACE)
    

    That is, include the --monitored-resource-type-prefix=k8s_ option.