Search code examples
kubernetesamazon-ec2kubernetes-podamazon-ebs

Unable to attach aws ebs volume, error "instance not found"


Info:

  • Kubernetes Server version: 1.14
  • AWS Cloud Provider
  • EBS volume, storageclass

Details: I have installed statefulset in our kubernetes cluster, however, it stuck it "ContainerCreating" status. Upon checking the logs, the error is "AttachVolume.Attach failed for volume pvc-xxxxxx: error finding instance ip-xxxxx : "instance not found"

It was succesfully installed around 17 days ago, but re-installing for an update caused the pod to stuck in ContainerCreating.

Manual attaching volume to the instance works. But doing it via storage class is not working and stuck in ContainerCreating status.

storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: ssd-default
allowVolumeExpansion: true
parameters:
  encrypted: "true"
  type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: Immediate

pvc yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
  name: data-thanos-store-0
  namespace: thanos
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
  storageClassName: ssd-default
  volumeMode: Filesystem
  volumeName: pvc-xxxxxx
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 3Gi
  phase: Bound

pv yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    kubernetes.io/createdby: aws-ebs-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    failure-domain.beta.kubernetes.io/region: ap-xxx
    failure-domain.beta.kubernetes.io/zone: ap-xxx
  name: pvc-xxxx
spec:
  accessModes:
  - ReadWriteOnce
  awsElasticBlockStore:
    fsType: ext4
    volumeID: aws://xxxxx
  capacity:
    storage: 3Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: data-athena-thanos-store-0
    namespace: thanos
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/region
          operator: In
          values:
          - ap-xxx
        - key: failure-domain.beta.kubernetes.io/zone
          operator: In
          values:
          - ap-xxx
  persistentVolumeReclaimPolicy: Delete
  storageClassName: ssd-default
  volumeMode: Filesystem
status:
  phase: Bound

Describe pvc:

Name:          data-athena-thanos-store-0
Namespace:     athena-thanos
StorageClass:  ssd-encrypted
Status:        Bound
Volume:        pvc-xxxx
Labels:        app.kubernetes.io/instance=athena-thanos-store
               app.kubernetes.io/name=athena-thanos-store
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      3Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    athena-thanos-store-0

Solution

  • The FailedAttachVolume error occurs when an EBS volume can’t be detached from an instance and thus cannot be attached to another. The EBS volume has to be in the available state to be attached. FailedAttachVolume is usually a symptom of an underlying failure to unmount and detach the volume.

    Notice that while describing the PVC the StorageClass name is ssd-encrypted which is a mismatch with the config you showed earlier where the kind: StorageClass name is ssd-default. That's why you can mount the volume manually but not via the StorageClass. You can drop and recreate the StorageClass with a proper data.

    Also, I recommend going through this article and using volumeBindingMode: WaitForFirstConsumer instead of volumeBindingMode: Immediate. This setting instructs the volume provisioner to not create a volume immediately, and instead, wait for a pod using an associated PVC to run through scheduling.