Search code examples
kubernetesgoogle-cloud-platformpermissionsgoogle-kubernetes-enginemetrics

GKE log errors about gke-metrics-agent and UAS


I'm using a private GKE cluster (Version 1.23.14-gke.1800). I have the following errors in kube-system gke-metrics-agent pod logs:

**error uasexporter/exporter.go:190 Error exporting metrics to UAS {"kind": "exporter", "name": "uas", "error": "reading from stream failed: rpc error: code = PermissionDenied desc = The caller does not have permission"}

error uasexporter/exporter.go:226 failed to get response from UAS {"kind": "exporter", "name": "uas", "error": "rpc error: code = PermissionDenied desc = The caller does not have permission"} **

app gke-metrics-agent

component gke-metrics-agent

container gke-metrics-agent

filename /var/log/pods/kube-system_gke-metrics-agent-9rbfv_6896b214-31d2-43bb-b15d-a8e1b122d41d/gke-metrics-agent/0.log

job kube-system/gke-metrics-agent

namespace kube-system

node_name gke-gke-production-production-88f13984-h83x

pod gke-metrics-agent-9rbfv

stream stderr

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2022-12-07T10:20:55Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  namespace: kube-system
  resourceVersion: "444"
  uid: ...
secrets: ..
- name: gke-metrics-agent-token-6zhvq

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: "2022-12-07T10:20:56Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  resourceVersion: "452"
  uid: ...
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gke-metrics-agent
subjects:
- kind: ServiceAccount
  name: gke-metrics-agent
  namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2022-12-07T10:20:56Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: gke-metrics-agent
  resourceVersion: "67979037"
  uid: ...
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - list
  - watch
- apiGroups:
  - policy
  resourceNames:
  - gce.gke-metrics-agent
  resources:
  - podsecuritypolicies
  verbs:
  - use

I think gke-metrics-agent is offical deamonset coming automatically in GKE. It's obvious that is some permission problem, but I don't even know what UAS means. I can't find any meaningful information in GCP documentation or Internet. I tried to grant some additional cluster roles (system:gke-uas-metrics-reader, external-metrics-reader) on current gke-metrics-agent service account, but the problem still persists.

From time to time I'm also detecting following problems in my cluster: Kubernetes aggregated API v1beta1.metrics.k8s.io/default is reporting errors Kubernetes aggregated API v1beta1.metrics.k8s.io/default has been only 75% available over the last 10m I think they are connected with this issue.

I will be very thankful if someone give me at least some directions. Thank you for your time and excuse my English!


Solution

  • UAS stands for Unified Autoscaling Platform and provides predictive and scheduled size recommendations to Autoscaler backend, it provides additional signal to zonal Autoscaler for Predictive Autoscaling and Scheduled Autoscaling

    Currently there is a known issue which is related to the UAS. This is occurring due to a LoggingMonitorConfig issue which Google is working on. For further updates on the issue follow the above link. Post a comment in the above link and ask them to do a workaround if any for now.

    If you find any issue with Google products and want to raise a feature request use the link Public Issue Tracker.