Search code examples
elasticsearchkuberneteskubernetes-statefulsetkubesprayefk

unable to deploy EFK stack on kubernetes (using kubespray)


i'm trying to deploy EFK stack on production kubernetes cluster (installed using kubespray), we have 3 nodes, 1 master + 2 workers, i need to use elasticsearch as a statefulset and use a local folder in master node to store logs (local storage for persistance), my configuration is :

kind: Namespace
apiVersion: v1
metadata:
  name: kube-logging

---
kind: Service
apiVersion: v1
metadata:
  name: elasticsearch
  namespace: kube-logging
  labels:
    app: elasticsearch
spec:
  selector:
    app: elasticsearch
  clusterIP: None
  ports:
    - port: 9200
      name: rest
    - port: 9300
      name: inter-node
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
  namespace: kube-logging
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
  namespace: kube-logging
spec:
  storageClassName: local-storage
  capacity:
    storage: 10Gi
  accessModes:
  - ReadWriteOnce
  hostPath:
    path: /tmp/elastic
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: es-cluster
  namespace: kube-logging
spec:
  serviceName: elasticsearch
  replicas: 2
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
        resources:
            limits:
              cpu: 1000m
              memory: 2Gi
        ports:
        - containerPort: 9200
          name: rest
          protocol: TCP
        - containerPort: 9300
          name: inter-node
          protocol: TCP
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        env:
          - name: cluster.name
            value: k8s-logs
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: discovery.seed_hosts
            value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
          - name: cluster.initial_master_nodes
            value: "es-cluster-0,es-cluster-1,es-cluster-2"
          - name: ES_JAVA_OPTS
            value: "-Xms512m -Xmx512m"
      initContainers:
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: increase-fd-ulimit
        image: busybox
        command: ["sh", "-c", "ulimit -n 65536"]
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: elasticsearch
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: local-storage
      resources:
        requests:
          storage: 5Gi
---

so this was my configuration but when it's applied, one of the two pods for Elasticsearch still in pending status. when i did kubectl describe for this pod this is the error that i get: "1 node(s) didn't find available persistent volumes to bind"

is my configuration correct ? must i use PV + storageclass + volumeClaimTemplates ? thank you in advance.

Those are my outputs:

    [root@node1 nex]# kubectl get pv
NAME    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                            STORAGECLASS    REASON   AGE
my-pv   5Gi        RWO            Retain           Bound    kube-logging/data-es-cluster-0   local-storage            24m
[root@node1 nex]# kubectl get pvc
NAME                STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS    AGE
data-es-cluster-0   Bound     my-pv    5Gi        RWO            local-storage   24m
data-es-cluster-1   Pending                                      local-storage   23m
[root@node1 nex]# kubectl describe pvc data-es-cluster-0
Name:          data-es-cluster-0
Namespace:     kube-logging
StorageClass:  local-storage
Status:        Bound
Volume:        my-pv
Labels:        app=elasticsearch
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      5Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    es-cluster-0
Events:
  Type    Reason                Age   From                         Message
  ----    ------                ----  ----                         -------
  Normal  WaitForFirstConsumer  24m   persistentvolume-controller  waiting for first consumer to be created before binding
[root@node1 nex]# kubectl describe pvc data-es-cluster-1
Name:          data-es-cluster-1
Namespace:     kube-logging
StorageClass:  local-storage
Status:        Pending
Volume:
Labels:        app=elasticsearch
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Mounted By:    es-cluster-1
Events:
  Type    Reason                Age                   From                         Message
  ----    ------                ----                  ----                         -------
  Normal  WaitForFirstConsumer  4m12s (x82 over 24m)  persistentvolume-controller  waiting for first consumer to be created before binding
[root@node1 nex]#

Solution

  • is my configuration correct ? must i use PV + storageclass + volumeClaimTemplates ? thank you in advance.

    Apart from what @Arghya Sadhu already suggested in his answer, I'd like to highlight one more thing in your current setup.

    If you're ok with the fact that your Elasticsearch Pods will be scheduled only on one particular node (in your case your master node), you can still use the local volume type. Don't confuse it however with hostPath. I noticed in your PV definition that you used hostPath key so chances are that you're not completely aware of the differences between this two concepts. Although they are quite similar, local type has bigger capabilities and some undeniable advantages over hostPath.

    As you can read in documentation:

    A local volume represents a mounted local storage device such as a disk, partition or directory.

    So it means that apart from specific directory you're also able to mount local disk or partition (/dev/sdb, /dev/sdb5 etc.). It can be e.g. an LVM partition with strictly defined capacity. Keep in mind that when it comes to mounting a local directory you are not able to enforce the capacity that can be actually used, so even if you define let's say 5Gi, logs can be written to your local directory even if this value is exceeded. But it's not the case with logical volume as you're able to define it's capacity and make sure it won't use more disk space than you gave it.

    Second difference is that:

    Compared to hostPath volumes, local volumes can be used in a durable and portable manner without manually scheduling Pods to nodes, as the system is aware of the volume’s node constraints by looking at the node affinity on the PersistentVolume.

    In this case it is the PersistentVolume where you define your node affinity, so any Pod (it can be Pod managed by your StatefulSet) which uses subsequently local-storage storage class and corresponding PersistenVolume will be automatically scheduled on the right node.

    As you can read further, nodeAffinity is actually the required field in such PV:

    PersistentVolume nodeAffinity is required when using local volumes. It enables the Kubernetes scheduler to correctly schedule Pods using local volumes to the correct node.

    As far as I understand, your kubernetes cluster is set up locally/on-premise. In this case NFS could be a right choice.

    If you used some cloud environment then you could use persistent storage offered by your particular cloud provider e.g. GCEPersistentDisk or AWSElasticBlockStore. The full list of persistent volume types currently supported by kubernetes you can find here.

    So again, If you're concerned about node-level redundancy in your StatefulSet and you would like your 2 Elasticsearch Pods to be scheduled always on different nodes, as @Arghya Sadhu already suggested, use NFS or some other non-local storage.

    However if you're not concerned about node-level redundancy and you're totally ok with fact that both your Elasticsearch Pods are running on the same node (master node in your case), please follow me :)

    As @Arghya Sadhu rightly pointed out:

    Even if a PV which is already bound to a PVC have spare capacity it can not be again bound to another PVC because it's one to one mapping between PV and PVC.

    Although it's always one to one mapping between PV and PVC, it doesn't mean you cannot use a single PVC in many Pods.

    Note, that in your StatefulSet example you used volumeClaimTemplates which basically means that each time when a new Pod managed by your StatefulSet is created, also a new corresponding PersistentVolumeClaim is created based on this template. So if you have e.g. 10Gi PersistentVolume defined, no matter if you request in your claim all 10Gi or only half of it, only first PVC will be successfully bound to your PV.

    But instead of using volumeClaimTemplates and creating a separate PVC for every stateful Pod you can make them use a single, manually defined PVC. Please take a look at the following example:

    First thing we need is a storage class. It looks quite similar to the one in your exaple:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: local-storage
    provisioner: kubernetes.io/no-provisioner
    volumeBindingMode: WaitForFirstConsumer
    

    First difference between this setup and yours is in PV definition. Instead of hostPath we're using here local volume:

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: example-pv
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Delete
      storageClassName: local-storage
      local:
        path: /var/tmp/test ### path on your master node
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - your-master-node-name
    

    Note that apart from defining local path, we also defined nodeAffinity rule that makes sure that all Pods which get this particular PV will be automatically scheduled on our master node.

    Then we have our manually applied PVC:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: myclaim
    spec:
      accessModes:
        - ReadWriteOnce
      volumeMode: Filesystem
      resources:
        requests:
          storage: 10Gi
      storageClassName: local-storage
    

    This PVC can now be used by all (in your example 2) Pods managed by StatefulSet:

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: web
    spec:
      selector:
        matchLabels:
          app: nginx # has to match .spec.template.metadata.labels
      serviceName: "nginx"
      replicas: 2 # by default is 1
      template:
        metadata:
          labels:
            app: nginx # has to match .spec.selector.matchLabels
        spec:
          terminationGracePeriodSeconds: 10
          containers:
          - name: nginx
            image: k8s.gcr.io/nginx-slim:0.8
            ports:
            - containerPort: 80
              name: web
            volumeMounts:
            - name: mypd
              mountPath: /usr/share/nginx/html
          volumes:
          - name: mypd
            persistentVolumeClaim:
              claimName: myclaim
    

    Note that in the above example we don't use volumeClaimTemplates any more but a single PersistentVolumeClaim which can be used by all our Pods. Pods are still unique as they are managed by a StatefulSet but instead of using unique PVCs, they use common one. Thanks to this approach both Pods can write logs to a single volume at the same time.

    In my example I used the nginx server to make the replication as easy as possible for everyone who wants to try it out quickly but I believe you can easily adjust it to your needs.