Search code examples
kubernetesgoogle-cloud-platformgoogle-kubernetes-enginekubectlkubernetes-pod

GKE | Statefulset gets deleted along with a service when deployed in the kube-system namespace


I have a GKE cluster of 5 nodes in the same zone. I'm trying to deploy an Elasticsearch statefulset of 3 nodes on the kube-system namespace, but every time I do the statefulset gets deleted and the pods get into the Terminating state immediately after the creation of the second pod.

I tried to check the pod logs and to describe the pod for any information but found nothing useful.

I even checked the GKE cluster logs where I detected the deletion request log but with no extra information of who is initiating it or why is it happening.

When I changed the namespace to default everything was fine and the pods were in the ready state.

Below is the manifest file I'm using for this deployment.

# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: elasticsearch-logging
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
  - ""
  resources:
  - "services"
  - "namespaces"
  - "endpoints"
  verbs:
  - "get"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: kube-system
  name: elasticsearch-logging
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
  name: elasticsearch-logging
  namespace: kube-system
  apiGroup: ""
roleRef:
  kind: ClusterRole
  name: elasticsearch-logging
  apiGroup: ""
---
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    version: 7.16.2
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
spec:
  serviceName: elasticsearch-logging
  replicas: 2
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      k8s-app: elasticsearch-logging
      version: 7.16.2
  template:
    metadata:
      labels:
        k8s-app: elasticsearch-logging
        version: 7.16.2
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccountName: elasticsearch-logging
      containers:
      - image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
        name: elasticsearch-logging
        resources:
          # need more cpu upon initialization, therefore burstable class
          limits:
            cpu: 1000m
          requests:
            cpu: 100m
        ports:
        - containerPort: 9200
          name: db
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        volumeMounts:
        - name: elasticsearch-logging
          mountPath: /data
        env:
#Added by Nour
        - name: discovery.seed_hosts
          value: elasticsearch-master-headless
        - name: "NAMESPACE"
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      volumes:
      - name: elasticsearch-logging
#        emptyDir: {}
      # Elasticsearch requires vm.max_map_count to be at least 262144.
      # If your OS already sets up this number to a higher value, feel free
      # to remove this init container.
      initContainers:
      - image: alpine:3.6
        command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"]
        name: elasticsearch-logging-init
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: elasticsearch-logging
    spec:
      storageClassName: "standard"
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 30Gi
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "Elasticsearch"
spec:
  type: NodePort
  ports:
  - port: 9200
    protocol: TCP
    targetPort: db
    nodePort: 31335
  selector:
    k8s-app: elasticsearch-logging
#Added by Nour
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: elasticsearch-master
  name: elasticsearch-master
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9200
    protocol: TCP
    targetPort: 9200
  - name: transport
    port: 9300
    protocol: TCP
    targetPort: 9300
  selector:
    app: elasticsearch-master
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: elasticsearch-master
  name: elasticsearch-master-headless
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9200
    protocol: TCP
    targetPort: 9200
  - name: transport
    port: 9300
    protocol: TCP
    targetPort: 9300
  clusterIP: None
  selector:
    app: elasticsearch-master

Below are the available namespaces

$ kubectl get ns
NAME              STATUS   AGE
default           Active   4d15h
kube-node-lease   Active   4d15h
kube-public       Active   4d15h
kube-system       Active   4d15h

Am I using any old API version that might cause the issue?

Thank you.


Solution

  • To close i think it would make sense to paste the final answer here.

    
    I understand your curiousity, i guess GCP just started preventing people from deploying stuff to the kube-system namespaces as it has the risk of messing with GKE. I never tried to deploy stuff to the kube-system namespace before so i'm sure if it was always like this or we just changed it
    
    Overall i recommend avoiding deploying stuff into the kube-system namespace in GKE```