Search code examples
chartsmongodb-replica-setkubernetes-helm

helm chart "stable/mongodb-replicaset" is stuck when 2nd pods bootstraps / is added to replicaset


Version of Helm and Kubernetes:

Kubernetes (GKE): Client Version: v1.9.7 Server Version: 1.10.6-gke.2

Helm: 2.10.0 Server: v2.10.0

Which chart:

stable/mongodb-replicaset

What happened:

Summary: 1st Pod started correct. 2nd Pod is stuck at "Init:2/3"

Details: I wanna setup a mongodb replicaset with 3 replicas. I want to use authentication and TLS with X509 for authentication. Here are the contents of my values.yaml file used for helm install:

replicas: 3
port: 27018

replicaSetName: rs0

podDisruptionBudget: {}
  # maxUnavailable: 1
  # minAvailable: 2

auth:
  enabled: true
  adminUser: admin
  adminPassword: pass1234
  metricsUser: metrics
  metricsPassword: pass1234
  key: abcdefghijklmnopqrstuvwxyz1234567890
  #existingKeySecret:
  #existingAdminSecret:
  #exisitingMetricsSecret:

# Specs for the Docker image for the init container that establishes the replica set
installImage:
  repository: k8s.gcr.io/mongodb-install
  tag: 0.6

  pullPolicy: IfNotPresent

# Specs for the MongoDB image
image:
  repository: mongo
  #tag: 3.6
  tag: latest
  pullPolicy: IfNotPresent

# Additional environment variables to be set in the container
extraVars: {}
# - name: TCMALLOC_AGGRESSIVE_DECOMMIT
#   value: "true"

# Prometheus Metrics Exporter
metrics:
  enabled: false
  image:
    repository: ssalaues/mongodb-exporter
    tag: 0.6.1
    pullPolicy: IfNotPresent
  port: 9216
  path: "/metrics"
  socketTimeout: 3s
  syncTimeout: 1m
  prometheusServiceDiscovery: true
  resources: {}

# Annotations to be added to MongoDB pods
podAnnotations: {}

securityContext:
  runAsUser: 999
  fsGroup: 999
  runAsNonRoot: true

resources:
  limits:
  #   cpu: 100m
    memory: 512Mi
  requests:
#   cpu: 100m
    memory: 256Mi

## Node selector
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
nodeSelector: {}

affinity: {}

tolerations: []

extraLabels: {}

persistentVolume:
  enabled: true
  ## mongodb-replicaset data Persistent Volume Storage Class
  ## If defined, storageClassName: <storageClass>
  ## If set to "-", storageClassName: "", which disables dynamic provisioning
  ## If undefined (the default) or set to null, no storageClassName spec is
  ##   set, choosing the default provisioner.  (gp2 on AWS, standard on
  ##   GKE, AWS & OpenStack)
  ##
  storageClass: "standard"
  accessModes:
    - ReadWriteOnce
  size: 10Gi
  annotations: {}

# Annotations to be added to the service
serviceAnnotations: {}

tls:
  # Enable or disable MongoDB TLS support
  enabled: true
  # Please generate your own TLS CA by generating it via:
  # $ openssl genrsa -out ca.key 2048
  # $ openssl req -x509 -new -nodes -key ca.key -days 10000 -out ca.crt -subj "/CN=mydomain.com"
  # After that you can base64 encode it and paste it here:
  # $ cat ca.key | base64 -w0
  cacert: base64 encoded ca certificate goes here
  cakey: base64 encoded ca key goes here

# Entries for the MongoDB config file
configmap:
  storage:
    dbPath: /data/db
  net:
    port: 27018
    ssl:
      mode: requireSSL
      CAFile: /data/configdb/tls.crt
      PEMKeyFile: /work-dir/mongo.pem
  replication:
    replSetName: rs0
  security:
    authorization: enabled
    clusterAuthMode: x509
    keyFile: /data/configdb/key.txt

# Readiness probe
readinessProbe:
  initialDelaySeconds: 5
  timeoutSeconds: 1
  failureThreshold: 3
  periodSeconds: 10
  successThreshold: 1

# Liveness probe
livenessProbe:
  initialDelaySeconds: 30
  timeoutSeconds: 5
  failureThreshold: 3
  periodSeconds: 10
  successThreshold: 1

I've installed the chart using:

helm install --name mongo-test -f values.yaml stable/mongodb-replicaset

helm installs first without any problems - no error messages during install:

NAME:   mongo-test
LAST DEPLOYED: Wed Aug 29 16:40:43 2018
NAMESPACE: b2c
STATUS: DEPLOYED

RESOURCES:
==> v1/Secret
NAME                                   TYPE               DATA  AGE
mongo-test-mongodb-replicaset-admin    Opaque             2     0s
mongo-test-mongodb-replicaset-ca       kubernetes.io/tls  2     0s
mongo-test-mongodb-replicaset-keyfile  Opaque             1     0s

==> v1/ConfigMap
NAME                                   DATA  AGE
mongo-test-mongodb-replicaset-init     1     0s
mongo-test-mongodb-replicaset-mongodb  1     0s
mongo-test-mongodb-replicaset-tests    1     0s

==> v1/Service
NAME                           TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)    AGE
mongo-test-mongodb-replicaset  ClusterIP  None        <none>       27018/TCP  0s

==> v1beta2/StatefulSet
NAME                           DESIRED  CURRENT  AGE
mongo-test-mongodb-replicaset  3        1        0s

==> v1/Pod(related)
NAME                             READY  STATUS   RESTARTS  AGE
mongo-test-mongodb-replicaset-0  0/1    Pending  0         0s

The 1st Pod is then started correctly and without any problems. 2nd Pod is stuck at "Init:2/3"

NAME                                          READY     STATUS     RESTARTS   AGE
po/mongo-test-mongodb-replicaset-0            1/1       Running    0          5m
po/mongo-test-mongodb-replicaset-1            0/1       Init:2/3   0          5m

when I connect to mongo-test-mongodb-replicaset-1 -c bootstrap I can find the following inside /work-dir/log.txt:

mongodb@mongo-test-mongodb-replicaset-1:/work-dir$ more log.txt
[2018-08-29T14:41:51,684293796+00:00] [on-start.sh] Bootstrapping MongoDB replica set member: mongo-test-mongodb-replicaset-1
[2018-08-29T14:41:51,687394595+00:00] [on-start.sh] Reading standard input...
[2018-08-29T14:41:51,688594499+00:00] [on-start.sh] Generating certificate
[2018-08-29T14:41:51,951181683+00:00] [on-start.sh] Peers: mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local
[2018-08-29T14:41:51,952080311+00:00] [on-start.sh] Starting a MongoDB instance...
[2018-08-29T14:41:51,953075555+00:00] [on-start.sh] Waiting for MongoDB to be ready...
2018-08-29T14:41:52.020+0000 I CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] MongoDB starting : pid=30 port=27017 dbpath=/data/db 64-bit host=mongo-test-mongodb-replicaset-1
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] db version v4.0.1
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] git version: 54f1582fc6eb01de4d4c42f26fc133e623f065fb
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] modules: none
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] build environment:
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-08-29T14:41:52.045+0000 I CONTROL  [initandlisten] options: { config: "/data/configdb/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017, ssl: { CAFile: "/data/configdb/tls.crt", PEMKeyFile: "/work-dir/mongo.pem", mode: "requireSS
L" } }, replication: { replSet: "rs0" }, security: { authorization: "enabled", clusterAuthMode: "x509", keyFile: "/data/configdb/key.txt" }, storage: { dbPath: "/data/db" } }
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten]
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-08-29T14:41:52.047+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-08-29T14:41:52.048+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1337M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=jou
rnal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
[2018-08-29T14:41:52,083645436+00:00] [on-start.sh] Retrying...
2018-08-29T14:41:52.789+0000 I STORAGE  [initandlisten] WiredTiger message [1535553712:789699][30:0x7fd33c091a00], txn-recover: Set global recovery timestamp: 0
2018-08-29T14:41:52.800+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2018-08-29T14:41:52.819+0000 I STORAGE  [initandlisten] createCollection: local.startup_log with generated UUID: 2a15f25a-5f7b-47d3-b1a3-2338677428d0
2018-08-29T14:41:52.832+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2018-08-29T14:41:52.833+0000 I STORAGE  [initandlisten] createCollection: local.replset.oplogTruncateAfterPoint with generated UUID: ae87a21f-d2dc-4474-b4de-d70d95b7a2a8
2018-08-29T14:41:52.847+0000 I STORAGE  [initandlisten] createCollection: local.replset.minvalid with generated UUID: ffaf5c08-356c-4ed0-b4e0-1b3d9cdeea92
2018-08-29T14:41:52.866+0000 I REPL     [initandlisten] Did not find local voted for document at startup.
2018-08-29T14:41:52.866+0000 I REPL     [initandlisten] Did not find local Rollback ID document at startup. Creating one.
2018-08-29T14:41:52.866+0000 I STORAGE  [initandlisten] createCollection: local.system.rollback.id with generated UUID: 6e3e4fc7-b821-4df6-9c32-7db2af4a3bc4
2018-08-29T14:41:52.880+0000 I REPL     [initandlisten] Initialized the rollback ID to 1
2018-08-29T14:41:52.880+0000 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument: Did not find replica set configuration document in local.system.replset
2018-08-29T14:41:52.881+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
2018-08-29T14:41:52.881+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: Replication has not yet been configured
2018-08-29T14:41:52.881+0000 I NETWORK  [initandlisten] waiting for connections on port 27017 ssl
2018-08-29T14:41:54.148+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50022 #1 (1 connection now open)
2018-08-29T14:41:54.154+0000 I ACCESS   [conn1] note: no users configured in admin.system.users, allowing localhost access
2018-08-29T14:41:54.154+0000 I NETWORK  [conn1] received client metadata from 127.0.0.1:50022 conn1: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:54.157+0000 I ACCESS   [conn1] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("95760764-878a-4dd0-8ce6-470182c48a3a") } ], $db: "admin" }
2018-08-29T14:41:54.158+0000 I NETWORK  [conn1] end connection 127.0.0.1:50022 (0 connections now open)
[2018-08-29T14:41:54,162283712+00:00] [on-start.sh] Initialized.
[2018-08-29T14:41:54,267062979+00:00] [on-start.sh] Found master: mongo-test-mongodb-replicaset-0.mongo-test-mongodb-replicaset.b2c.svc.cluster.local
[2018-08-29T14:41:54,268004950+00:00] [on-start.sh] Adding myself (mongo-test-mongodb-replicaset-1.mongo-test-mongodb-replicaset.b2c.svc.cluster.local) to replica set...
2018-08-29T14:41:54.368+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:50550 #2 (1 connection now open)
2018-08-29T14:41:54.371+0000 I ACCESS   [conn2]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:54.371+0000 I ACCESS   [conn2] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:50550 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-repli
caset-0@$external
2018-08-29T14:41:54.372+0000 I NETWORK  [conn2] end connection 10.43.67.35:50550 (0 connections now open)
2018-08-29T14:41:54.375+0000 I NETWORK  [listener] connection accepted from 10.43.67.35:50552 #3 (1 connection now open)
2018-08-29T14:41:54.378+0000 I NETWORK  [conn3] received client metadata from 10.43.67.35:50552 conn3: { driver: { name: "NetworkInterfaceTL", version: "4.0.1" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16
.04" } }
2018-08-29T14:41:54.379+0000 I ACCESS   [conn3]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=mongo-test-mongodb-replicaset-0", $db: "$external" }
2018-08-29T14:41:54.379+0000 I ACCESS   [conn3] Failed to authenticate CN=mongo-test-mongodb-replicaset-0@$external from client 10.43.67.35:50552 with mechanism MONGODB-X509: UserNotFound: Could not find user CN=mongo-test-mongodb-repli
caset-0@$external
[2018-08-29T14:41:57,388632734+00:00] [on-start.sh] Waiting for replica to reach SECONDARY state...
2018-08-29T14:41:57.441+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50032 #4 (2 connections now open)
2018-08-29T14:41:57.446+0000 I NETWORK  [conn4] received client metadata from 127.0.0.1:50032 conn4: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:57.448+0000 I ACCESS   [conn4] Supported SASL mechanisms requested for unknown user 'admin@admin'
2018-08-29T14:41:57.448+0000 I ACCESS   [conn4] SASL SCRAM-SHA-1 authentication failed for admin on admin from client 127.0.0.1:50032 ; UserNotFound: Could not find user admin@admin
2018-08-29T14:41:57.450+0000 I ACCESS   [conn4] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("38098891-2b34-46eb-aef2-53a69416671f") } ], $db: "admin" }
2018-08-29T14:41:57.451+0000 I NETWORK  [conn4] end connection 127.0.0.1:50032 (1 connection now open)
2018-08-29T14:41:58.504+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50034 #5 (2 connections now open)
2018-08-29T14:41:58.508+0000 I NETWORK  [conn5] received client metadata from 127.0.0.1:50034 conn5: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }
2018-08-29T14:41:58.509+0000 I ACCESS   [conn5] Supported SASL mechanisms requested for unknown user 'admin@admin'
2018-08-29T14:41:58.510+0000 I ACCESS   [conn5] SASL SCRAM-SHA-1 authentication failed for admin on admin from client 127.0.0.1:50034 ; UserNotFound: Could not find user admin@admin
2018-08-29T14:41:58.511+0000 I ACCESS   [conn5] Unauthorized: not authorized on admin to execute command { endSessions: [ { id: UUID("e1089881-4c47-450c-9721-6b291c6f0e50") } ], $db: "admin" }
2018-08-29T14:41:58.512+0000 I NETWORK  [conn5] end connection 127.0.0.1:50034 (1 connection now open)
2018-08-29T14:41:59.574+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:50036 #6 (2 connections now open)
2018-08-29T14:41:59.582+0000 I NETWORK  [conn6] received client metadata from 127.0.0.1:50036 conn6: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1" }, os: { type: "Linux", name: "Ub
untu", architecture: "x86_64", version: "16.04" } }

....to be continued...


Solution

  • Just in case, someone with the same problem stumbles in here: the root cause of the problem is the way the x509 certificates are generates by the helm chart. Basically there are too less information in the cert's subject line to satisfy mongodb's requirements and therefore the cert's can not be used for replicaset member authentication and this leads to the observed errors and stucks the whole setup process.

    Good news is, there is a workaround - described over on GitHub: https://github.com/helm/charts/issues/7417#issuecomment-422293057

    Has still to be fixed in the helm chart.