Search code examples
amazon-eksnvidiaamazon-amicontainerd

contained runtime configuration does not persist after AMI creation


I have created an AMI image to be deployed in my AWS EKS. The image has been created from a EC2 instance type P3.2xLarge, I have installed CUDA drivers and Nvidia-container-toolkit and I have configured containerd as explained here:

Everything works fine in the EC2 machine, but after create the AMI and using it in the launch template of my cluster node, the containerd config located in /etc/contained/config.toml has not persisted.

in the EC2 the /etc/contained/config.toml file looks like:

root = "/var/lib/containerd"
state = "/run/containerd"
version = 2
 
[grpc]
  address = "/run/containerd/containerd.sock"
 
[plugins]
 
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.5"
 
    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"
 
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"
      discard_unpacked_layers = true
 
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
 
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
 
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"
            SystemdCgroup = true
 
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
 
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true
 
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

but the node deployed with the AMI looks like this:

version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
 
[grpc]
address = "/run/containerd/containerd.sock"
 
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true
 
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.5"
 
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"
 
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
 
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
 
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

That means that the change performed in the EC2 has not persisted when creating the AMI.

Somebody knows how to figure out this issue or why is happening?

Thanks in advance


Solution

  • I Finally found how to properly setup the containerd runtime configuration. I've followed the AWS documentation for run commands on Linux instance at launch and I extrapolated this to my AMI image following these steps:

    1. In the launch template dashboard select the launch template corresponding to my cluster node. enter image description here
    2. Once in the launch template, select Actions and then Modify template (create a new version) enter image description here
    3. Scroll down to the section advanced details and expand it enter image description here
    4. Scroll down to the section User data. enter image description here
    5. Paste the commands to modify the contanerd runtime configuration. enter image description here
    6. Set the new version as a default template. enter image description here
    7. Modify the autoscaler settings selection the new launch template version. enter image description here

    And this is it. Everything works perfectly now.