This morning I discovered that Loki stopped working in the EKS cluster
In the loki pod logs I see the following:
level=error ts=2022-04-07T10:44:43.298418416Z caller=table_manager.go:233 msg="error syncing tables" err="WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-east-2.amazonaws.com/\": dial tcp XXX.XXX.XXX.XXX:443: i/o timeout"
Example values file - deployment happens via flux:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: loki
namespace: loki
spec:
values:
extraArgs:
target: all,table-manager
serviceAccount:
create: true
name: lokiaccess
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<PRIVATE_ID>:role/LokiAccess
eks.amazonaws.com/sts-regional-endpoints: "true"
config:
storage_config:
boltdb_shipper:
shared_store: s3
aws:
s3: s3://us-east-2/<PRIVATE_STORAGE>
dynamodb:
dynamodb_url: dynamodb://us-east-2
schema_config:
configs:
- from: "2022-04-04"
store: aws
object_store: s3
schema: v11
index:
prefix: loki_
period: 24h
As a result, the problem was in the IAM role for AWS. I changed the settings and added
"arn:aws:dynamodb:us-east-2:${data.aws_caller_identity.current.account_id}:table/loki_/index/*",
"arn:aws:dynamodb:us-east-2:${data.aws_caller_identity.current.account_id}:table/loki_",
"arn:aws:s3:::${var.aws_s3_loki_storage}/*",
"arn:aws:s3:::${var.aws_s3_loki_storage}"
To our terraform script