Search code examples
amazon-ekscontainerdtrivy

Trivy on EKS unable to scan any images


I am trying to scan all images deployed on my EKS cluster I am setting up for high security (will be deployed to classified IL5 environment). Kubernetes v1.23, all worker nodes run on Bottlerocket OS.

I expect images to be scanned and available in the VulnerabilityReports CRD.

I was able to successfully install Falco to the cluster (uses containerd). However, when deploying the official Helm chart (0.6.0-rc3) the scan-vulnerability containers start and then immediately error out. I set this environment variable on the trivy-operator deployment:

- name: CONTAINER_RUNTIME_ENDPOINT
  value: /run/containerd/containerd.sock

Output of run with -debug:

{
  "level": "error",
  "ts": 1668286646.865245,
  "logger": "reconciler.vulnerabilityreport",
  "msg": "Scan job container",
  "job": "trivy-system/scan-vulnerabilityreport-74f54b6cd",
  "container": "discovery",
  "status.reason": "Error",
  "status.message": "2022-11-12T20:57:13.674Z\t\u001b[31mFATAL\u001b[0m\timage scan error: scan error: unable to initialize a scanner: unable to initialize a docker scanner: 4 errors occurred:\n\t* unable to inspect the image (023620263533.dkr.ecr.us-gov-east-1.amazonaws.com/docker.io/istio/pilot:1.15.2): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\n\t* unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory\n\t* containerd socket not found: /run/containerd/containerd.sock\n\t* GET https://023620263533.dkr.ecr.us-gov-east-1.amazonaws.com/v2/docker.io/istio/pilot/manifests/1.15.2: unexpected status code 401 Unauthorized: Not Authorized\n\n\n\n",
  "stacktrace": "github.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).processFailedScanJob\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:551\ngithub.com/aquasecurity/trivy-operator/pkg/vulnerabilityreport.(*WorkloadController).reconcileJobs.func1\n\t/home/runner/work/trivy-operator/trivy-operator/pkg/vulnerabilityreport/controller.go:376\nsigs.k8s.io/controller-runtime/pkg/reconcile.Func.Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/reconcile/reconcile.go:102\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"
}

I confirmed that bottlerocket uses containerd, as /run/containerd/containerd.sock is specified on my Falco deployment. Even when I mount this as volume onto the pod and set the CONTAINER_RUNTIME_ENDPOINT to this path I get the same error.

Edit I added the following security context:

  seLinuxOptions:
    user: system_u
    role: system_r
    type: control_t
    level: s0-s0:c0.c1023

Solution

  • Initially I mounted the dockershim.sock from the host to the pod, then realized that was not necessary, the error messages were a little misleading, it was really an authentication with ECR issue. Furthermore, the seLinux flags needed to be specified at the pod level, and not the container level.