Search code examples
securityelasticsearchcertificatekibanaxpack

Elastic Search upgrade to v8 on Kubernetes


I am having an elastic search deployment on a Microsoft Kubernetes cluster that was deployed with a 7.x chart and I changed the image to 8.x. This upgrade worked and both elastic and Kibana was accessible, but now i need to enable THE new security feature which is included in the basic license from now on. The reason behind the security first came from the requirement to enable APM Server/Agents.

I have the following values:

- name: cluster.initial_master_nodes
  value: elasticsearch-master-0,
- name: discovery.seed_hosts
  value: elasticsearch-master-headless
- name: cluster.name
  value: elasticsearch
- name: network.host
  value: 0.0.0.0
- name: cluster.deprecation_indexing.enabled
  value: 'false'
- name: node.roles
  value: data,ingest,master,ml,remote_cluster_client

The elastic search and kibana pods are able to start but i am unable to set APM Integration due security. So I am enabling security using the below values:

- name: xpack.security.enabled
  value: 'true'

Then i am getting an error log from the elasic search pod: "Transport SSL must be enabled if security is enabled. Please set [xpack.security.transport.ssl.enabled] to [true] or disable security by setting [xpack.security.enabled] to [false]". So i am enabling ssl using the below values:

- name: xpack.security.transport.ssl.enabled
  value: 'true'

Then i am getting an error log from elastic search pod: "invalid SSL configuration for xpack.security.transport.ssl - server ssl configuration requires a key and certificate, but these have not been configured; you must set either [xpack.security.transport.ssl.keystore.path] (p12 file), or both [xpack.security.transport.ssl.key] (pem file) and [xpack.security.transport.ssl.certificate] (pem key file)".

I start with Option1, i am creating the keys using the below commands (no password / enter, enter / enter, enter, enter) and i am coping them to a persistent folder:

./bin/elasticsearch-certutil ca
./bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
cp elastic-stack-ca.p12 data/elastic-stack-ca.p12
cp elastic-certificates.p12 data/elastic-certificates.p12

In addition I am also configuring the below values:

- name: xpack.security.transport.ssl.truststore.path
  value: '/usr/share/elasticsearch/data/elastic-certificates.p12'
- name: xpack.security.transport.ssl.keystore.path
  value: '/usr/share/elasticsearch/data/elastic-certificates.p12'

But the pod is still in initializing, if generate the certificates with password. then i am getting an error log from elastic search pod: "cannot read configured [PKCS12] keystore (as a truststore) [/usr/share/elasticsearch/data/elastic-certificates.p12] - this is usually caused by an incorrect password; (no password was provided)"

Then i go to Option2, i am creating the keys using the below commands and i am coping them to a persistent folder

./bin/elasticsearch-certutil ca --pem
unzip elastic-stack-ca.zip –d

cp ca.crt data/ca.crt
cp ca.key data/ca.key

In addition I am also configuring the below values:

- name: xpack.security.transport.ssl.key
  value: '/usr/share/elasticsearch/data/ca.key'
- name: xpack.security.transport.ssl.certificate
  value: '/usr/share/elasticsearch/data/ca.crt'

But the pod is still in initializing state without providing any logs, as i know while pod is in initializing state it does not produce any container logs. From portal side in events everything seems to be ok, except the elastic pod which is not in ready state. enter image description here

At last i located the same issue to the eleastic search community, without any response: https://discuss.elastic.co/t/elasticsearch-pods-are-not-ready-when-xpack-security-enabled-is-configured/281709?u=s19k15

Here is my StatefullSet

status:
  observedGeneration: 169
  replicas: 1
  updatedReplicas: 1
  currentRevision: elasticsearch-master-7449d7bd69
  updateRevision: elasticsearch-master-7d8c7b6997
  collisionCount: 0
spec:
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch-master
  template:
    metadata:
      name: elasticsearch-master
      creationTimestamp: null
      labels:
        app: elasticsearch-master
        chart: elasticsearch
        release: platform
    spec:
      initContainers:
        - name: configure-sysctl
          image: docker.elastic.co/elasticsearch/elasticsearch:8.1.2
          command:
            - sysctl
            - '-w'
            - vm.max_map_count=262144
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
            runAsUser: 0
      containers:
        - name: elasticsearch
          image: docker.elastic.co/elasticsearch/elasticsearch:8.1.2
          ports:
            - name: http
              containerPort: 9200
              protocol: TCP
            - name: transport
              containerPort: 9300
              protocol: TCP
          env:
            - name: node.name
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: cluster.initial_master_nodes
              value: elasticsearch-master-0,
            - name: discovery.seed_hosts
              value: elasticsearch-master-headless
            - name: cluster.name
              value: elasticsearch
            - name: cluster.deprecation_indexing.enabled
              value: 'false'
            - name: ES_JAVA_OPTS
              value: '-Xmx512m -Xms512m'
            - name: node.roles
              value: data,ingest,master,ml,remote_cluster_client
            - name: xpack.license.self_generated.type
              value: basic
            - name: xpack.security.enabled
              value: 'true'
            - name: xpack.security.transport.ssl.enabled
              value: 'true'
            - name: xpack.security.transport.ssl.truststore.path
              value: /usr/share/elasticsearch/data/elastic-certificates.p12
            - name: xpack.security.transport.ssl.keystore.path
              value: /usr/share/elasticsearch/data/elastic-certificates.p12
            - name: xpack.security.http.ssl.enabled
              value: 'true'
            - name: xpack.security.http.ssl.truststore.path
              value: /usr/share/elasticsearch/data/elastic-certificates.p12
            - name: xpack.security.http.ssl.keystore.path
              value: /usr/share/elasticsearch/data/elastic-certificates.p12
            - name: logger.org.elasticsearch.discovery
              value: debug
            - name: path.logs
              value: /usr/share/elasticsearch/data
            - name: xpack.security.enrollment.enabled
              value: 'true'
          resources:
            limits:
              cpu: '1'
              memory: 2Gi
            requests:
              cpu: 100m
              memory: 512Mi
          volumeMounts:
            - name: elasticsearch-master
              mountPath: /usr/share/elasticsearch/data
          readinessProbe:
            exec:
              command:
                - bash
                - '-c'
                - >
                  set -e

                  # If the node is starting up wait for the cluster to be ready
                  (request params: "wait_for_status=green&timeout=1s" )

                  # Once it has started only check that the node itself is
                  responding

                  START_FILE=/tmp/.es_start_file


                  # Disable nss cache to avoid filling dentry cache when calling
                  curl

                  # This is required with Elasticsearch Docker using nss < 3.52

                  export NSS_SDB_USE_CACHE=no


                  http () {
                    local path="${1}"
                    local args="${2}"
                    set -- -XGET -s

                    if [ "$args" != "" ]; then
                      set -- "$@" $args
                    fi

                    if [ -n "${ELASTIC_PASSWORD}" ]; then
                      set -- "$@" -u "elastic:${ELASTIC_PASSWORD}"
                    fi

                    curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
                  }


                  if [ -f "${START_FILE}" ]; then
                    echo 'Elasticsearch is already running, lets check the node is healthy'
                    HTTP_CODE=$(http "/" "-w %{http_code}")
                    RC=$?
                    if [[ ${RC} -ne 0 ]]; then
                      echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
                      exit ${RC}
                    fi
                    # ready if HTTP code 200, 503 is tolerable if ES version is 6.x
                    if [[ ${HTTP_CODE} == "200" ]]; then
                      exit 0
                    elif [[ ${HTTP_CODE} == "503" && "8" == "6" ]]; then
                      exit 0
                    else
                      echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
                      exit 1
                    fi

                  else
                    echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
                    if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
                      touch ${START_FILE}
                      exit 0
                    else
                      echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
                      exit 1
                    fi
                  fi
            initialDelaySeconds: 10
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 3
            failureThreshold: 3
          lifecycle:
            postStart:
              exec:
                command:
                  - bash
                  - '-c'
                  - >
                    #!/bin/bash

                    # Create the
                    dev.general.logcreation.elasticsearchlogobject.v1.json index

                    ES_URL=http://localhost:9200

                    while [[ "$(curl -s -o /dev/null -w '%{http_code}\n'
                    $ES_URL)" != "200" ]]; do sleep 1; done

                    curl --request PUT --header 'Content-Type: application/json'
                    "$ES_URL/dev.general.logcreation.elasticsearchlogobject.v1.json/"
                    --data
                    '{"mappings":{"properties":{"Properties":{"properties":{"StatusCode":{"type":"text"}}}}},"settings":{"index":{"number_of_shards":"1","number_of_replicas":"0"}}}'
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            capabilities:
              drop:
                - ALL
            runAsUser: 1000
            runAsNonRoot: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 120
      dnsPolicy: ClusterFirst
      automountServiceAccountToken: true
      securityContext:
        runAsUser: 1000
        fsGroup: 1000
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - elasticsearch-master
              topologyKey: kubernetes.io/hostname
      schedulerName: default-scheduler
      enableServiceLinks: true
  volumeClaimTemplates:
    - kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: elasticsearch-master
        creationTimestamp: null
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 4Gi
        volumeMode: Filesystem
      status:
        phase: Pending
  serviceName: elasticsearch-master-headless
  podManagementPolicy: Parallel
  updateStrategy:
    type: RollingUpdate
  revisionHistoryLimit: 10

Any ideas?


Solution

  • Finally found the answer, maybe it helps lot of people in case they face something similar. When the pod is initializing endlessly is like sleeping. In my case a strange code inside my chart StatefullSet started causing this issue when security became enabled.

    while [[ "$(curl -s -o /dev/null -w '%{http_code}\n'
                        $ES_URL)" != "200" ]]; do sleep 1; done
    

    This will not return 200 as now the http excepts also a user and a password to authenticate and therefore is goes for a sleep.

    So make sure that in case the pods are in initializing state and remaining there, there is no any while/sleep