Search code examples
amazon-web-serviceskubernetesamazon-ec2

Missing NVMe SSD in AWS Kubernetes


AWS seems to be hiding my NVMe SSD when an r6gd instance is deployed in Kubernetes, created via the config below.

# eksctl create cluster -f spot04test00.yaml                                                      
apiVersion: eksctl.io/v1alpha5               
kind: ClusterConfig                          
metadata:                                    
  name: tidb-arm-dev #replace with your cluster name
  region: ap-southeast-1 #replace with your preferred AWS region
nodeGroups:                                  
  - name: tiflash-1a                         
    desiredCapacity: 1                       
    availabilityZones: ["ap-southeast-1a"]   
    instancesDistribution:                   
      instanceTypes: ["r6gd.medium"]         
    privateNetworking: true                  
    labels:                                  
      dedicated: tiflash

The running instance has an 80 GiB EBS gp3 block and ZERO NVMe SSD storage as shown in Figure 1.

Figure 1.The 59 GiB NVMe SSD for r6gd instance is swapped out for a 80 GiB gp3 EBS block. What happended to my NVMe SSD?

Why did Amazon swapped out the 59GiB NVMe for a 80 GiB EBS gp3 storage?

where has my NVMe disk gone?

  1. Even if I pre-allocate ephemeral-storage using non-managed nodeGroups, it still showed an 80 GiB EBS storage (Figure 1).

  2. If I use the AWS Web UI to start a new r6gd instance, it clearly shows the attached NVMe SSD (Figure 2)

Figure 2. 59 GiB NVMe for r6gd instance created via AWS Web Console.

After further experimentations, it was found that the 80 GiB EBS volume is attached to r6gd.medium, r6g.medium, r6gd.large, r6g.large instances as a 'ephemeral' resource, regardless of instance size.

eksctl describe nodes:

Capacity:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           83864556Ki
  hugepages-2Mi:               0
  memory:                      16307140Ki
  pods:                        29
Allocatable:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           77289574682
  hugepages-2Mi:               0
  memory:                      16204740Ki
  pods:                        29

Capacity:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           83864556Ki
  hugepages-2Mi:               0
  memory:                      16307140Ki
  pods:                        29
Allocatable:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           77289574682
  hugepages-2Mi:               0
  memory:                      16204740Ki
  pods:                        29

Awaiting enlightenment from folks who have successfully utilized NVMe SSD in Kubernetes.


Solution

  • Solved my issue, here are my learnings:

    1. NVMe will not show up in the instance by default (either in AWS web console or within terminal of the VM), but is accessible as /dev/nvme1. Yes you need to format and mount them. For a single VM, that is straightforward, but for k8s, you need to deliberately format them before you can use them.

    2. the 80GB can be overridden with settings on the kubernetes config file

    3. to utilize the VM attached NVMe in k8s, you need to run these 2 additional kubernetes services while setting up the k8s nodes. Remember to modify the yaml files of the 2 servcies to use ARM64 images if you are using ARM64 VM's:

      a. storage-local-static-provisioner

      • ARM64 image: jasonxh/local-volume-provisioner:latest

      b. eks-nvme-ssd-provisioner

      • ARM64 image: zhangguiyu/eks-nvme-ssd-provisioner
    4. The NVMe will never show up as part of the ephemeral storage of your k8s clusters. That ephemeral storage describes the EBS volume you have attached to each VM. I have since restricted mine to 20GB EBS.

    5. The PV will show up when you type kubectl get pvc:

    6. Copies of TiDB node config files below for reference:

    • kubectl get pvc

        guiyu@mi:~/dst/bin$ kubectl get pv
        NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                           STORAGECLASS    REASON   AGE
        local-pv-1a3321d4   107Gi      RWO            Retain           Bound    tidb-cluster-dev/tikv-tidb-arm-dev-tikv-2       local-storage            9d
        local-pv-82e9e739   107Gi      RWO            Retain           Bound    tidb-cluster-dev/pd-tidb-arm-dev-pd-1           local-storage            9d
        local-pv-b9556b9b   107Gi      RWO            Retain           Bound    tidb-cluster-dev/data0-tidb-arm-dev-tiflash-2   local-storage            6d8h
        local-pv-ce6f61f2   107Gi      RWO            Retain           Bound    tidb-cluster-dev/pd-tidb-arm-dev-pd-2           local-storage            9d
        local-pv-da670e42   107Gi      RWO            Retain           Bound    tidb-cluster-dev/tikv-tidb-arm-dev-tikv-3       local-storage            6d8h
        local-pv-f09b19f4   107Gi      RWO            Retain           Bound    tidb-cluster-dev/pd-tidb-arm-dev-pd-0           local-storage            9d
        local-pv-f337849f   107Gi      RWO            Retain           Bound    tidb-cluster-dev/data0-tidb-arm-dev-tiflash-0   local-storage            9d
        local-pv-ff2f11c6   107Gi      RWO            Retain           Bound    tidb-cluster-dev/tikv-tidb-arm-dev-tikv-0       local-storage            9d
      
    • pods.yaml

      tiflash:
        baseImage: pingcap/tiflash-arm64
        maxFailoverCount: 3
        replicas: 2
        nodeSelector:
          dedicated: tiflash
        tolerations:
        - effect: NoSchedule
          key: dedicated
          operator: Equal
          value: tiflash
        storageClaims:
        - resources:
            requests:
              storage: "100Gi"
          storageClassName: local-storage
      
    • eks-setup.yaml

      - name: tiflash-1a
        desiredCapacity: 1
        instanceTypes: ["r6gd.large"]
        privateNetworking: true
        availabilityZones: ["ap-southeast-1a"]
        spot: false
        volumeSize: 20      # GiB EBS gp3 3000 IOPS
        volumeType: gp3
        ssh:
            allow: true
            publicKeyPath: '~/dst/etc/data-platform-dev.pub'
        labels:
          dedicated: tiflash