Search code examples
amazon-ecspacker

KernelId for Packer EBS builder


Is there a way to set the Kernel ID to use with the Packer EBS builder?

I'm trying to build an AMI (Amazon Machine Image) for ECS (Elastic Container Service) with NVIDIA's drivers installed so that I can run GPU tasks within containers.

Here's my Packer file:

{
  "variables": {
    "aws_access_key": "",
    "aws_secret_key": "",
    "region": "us-west-2",

    "ami": "ami-62d35c02",
    "ami_arch": "x86_64",

    "nvidia_release": "367.57"
  },
  "builders": [
    {
      "type": "amazon-ebs",
      "access_key": "{{ user `aws_access_key` }}",
      "secret_key": "{{ user `aws_secret_key` }}",
      "region": "{{ user `region` }}",
      "source_ami": "{{ user `ami` }}",
      "instance_type": "g2.2xlarge",
      "ssh_username": "ec2-user",
      "ami_name": "ecs-machine-image",
      "user_data": "#cloud-config\nrepo_releasever: 2016.09"
    }
  ],
  "provisioners": [
    {
      "type": "shell",
      "inline": [
        "sudo yum update -y",
        "sudo yum groupinstall -y \"Development Tools\"",
        "sudo yum install -y kernel-devel-$(uname -r)",
        "cd /tmp",
        "curl -L -O http://us.download.nvidia.com/XFree86/Linux-{{ user `ami_arch` }}/{{ user `nvidia_release` }}/NVIDIA-Linux-{{ user `ami_arch` }}-{{ user `nvidia_release` }}.run",
        "chmod +x NVIDIA-Linux-{{ user `ami_arch` }}-{{ user `nvidia_release` }}.run",
        "echo `uname -a`",
        "sudo sh -c \"./NVIDIA-Linux-{{ user `ami_arch` }}-{{ user `nvidia_release` }}.run -silent\""
      ]
    }
  ]
}

This builds fine, but the line that echos the kernel version (i.e. uname -r) shows that the packer builder is running a machine with kernel 4.4.51-40.58. However, when I boot an instance via the AWS Console using this AMI, I see that the instance is running kernel 4.9.20-11.31, and it thus unable to find the kernel module nvidia, which if I run find is present within /lib/modules/4.4.51-40.58/....

I tried having Packer stage the /boot/grub/menu.lst file with the following contents:

# created by imagebuilder
default=0
timeout=0
hiddenmenu

title Amazon Linux 2016.09 (4.4.51-40.58.amzn1.x86_64)
root (hd0,0)
kernel /boot/vmlinuz-4.4.51-40.58.amzn1.x86_64 root=LABEL=/ 
console=tty1 console=ttyS0
initrd /boot/initramfs-4.4.51-40.58.amzn1.x86_64.img

With this change, again I boot an instance, but it is then running 4.4.51-40.60, instead of 4.4.51-40.58, and there is a corresponding extra entry in menu.lst that I did not specify.

Is building kernel modules into a Packer EBS image as I am trying to do an anti-pattern, or am I missing something?


Solution

  • g2.2xlarge are HVM only and Kernel Id is only valid for PV instances.

    Instead the problem is that you do:

    sudo yum update -y
    

    Which will install a new kernel which will be used on next boot. But then you install Nvidia drivers for the current running kernel.

    curl -L -O http://us.download.nvidia.com/XFree86/Linux-{{ user `ami_arch` }}/{{ user `nvidia_release` }}/NVIDIA-Linux-{{ user `ami_arch` }}-{{ user `nvidia_release` }}.run
    chmod +x NVIDIA-Linux-{{ user `ami_arch` }}-{{ user `nvidia_release` }}.run
    echo `uname -a`
    sudo sh -c "./NVIDIA-Linux-{{ user `ami_arch` }}-{{ user `nvidia_release` }}.run -silent"
    

    I expect it to work fine if you don't update the kernel, or if you explicitly build a module for the new kernel.