Full yaml file here (not embedded in question because it's rather long and because much of the important bits are covered by the describe
below):
https://gist.github.com/sporkmonger/46a820f9a1ed8a73d89a319dffb24608
Using a public container image I created here: sporkmonger/nsq-k8s:0.3.8
Container is identical to the official NSQ image, but using Debian Jessie instead of Alpine/musl to solve DNS issues that tend to be a problem for Alpine-on-Kubernetes.
Here's what happens when I describe one of the pods:
❯ kubectl describe pod nsqd-0
Name: nsqd-0
Namespace: default
Node: minikube/192.168.99.100
Start Time: Sun, 04 Dec 2016 20:58:06 -0800
Labels: app=nsq
Status: Terminating (expires Sun, 04 Dec 2016 21:02:31 -0800)
Termination Grace Period: 60s
IP: 172.17.0.8
Controllers: PetSet/nsqd
Containers:
nsqd:
Container ID: docker://381e4a1313e4e13a63b8a17004d79a6e828a8bc1c9e20419b319f8a9757f266b
Image: sporkmonger/nsq-k8s:0.3.8
Image ID: docker://sha256:01691a91cee3e1a6992b33a10e99baa57c5b8ce7b765849540a830f0b554e707
Ports: 4150/TCP, 4151/TCP
Command:
/bin/sh
-c
Args:
/usr/local/bin/nsqd
-data-path
/data
-broadcast-address
$(hostname -f)
-lookupd-tcp-address
nsqlookupd-0.nsqlookupd.default.svc.cluster.local:4160
-lookupd-tcp-address
nsqlookupd-1.nsqlookupd.default.svc.cluster.local:4160
-lookupd-tcp-address
nsqlookupd-2.nsqlookupd.default.svc.cluster.local:4160
State: Running
Started: Sun, 04 Dec 2016 20:58:11 -0800
Ready: True
Restart Count: 0
Liveness: http-get http://:http/ping delay=5s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ping delay=1s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/data from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-k6ufj (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-nsqd-0
ReadOnly: false
default-token-k6ufj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-k6ufj
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4m 4m 1 {default-scheduler } Normal Scheduled Successfully assigned nsqd-0 to minikube
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Pulling pulling image "sporkmonger/nsq-k8s:0.3.8"
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Pulled Successfully pulled image "sporkmonger/nsq-k8s:0.3.8"
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Created Created container with docker id 381e4a1313e4; Security:[seccomp=unconfined]
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Started Started container with docker id 381e4a1313e4
0s 0s 1 {kubelet minikube} spec.containers{nsqd} Normal Killing Killing container with docker id 381e4a1313e4: Need to kill pod.
A fairly representative watch of about 30 seconds of cluster activity:
❯ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
nsqadmin-0 1/1 Running 3 33m
nsqadmin-1 1/1 Running 0 32m
nsqd-0 1/1 Running 0 6m
nsqd-1 1/1 Running 0 4m
nsqd-2 1/1 Terminating 0 1m
nsqd-3 1/1 Running 0 30s
nsqlookupd-0 1/1 Running 0 30s
NAME READY STATUS RESTARTS AGE
nsqlookupd-1 0/1 Pending 0 0s
nsqlookupd-1 0/1 Pending 0 0s
nsqlookupd-1 0/1 ContainerCreating 0 0s
nsqlookupd-1 0/1 Running 0 4s
nsqlookupd-1 1/1 Running 0 8s
nsqlookupd-2 0/1 Pending 0 0s
nsqlookupd-2 0/1 Pending 0 0s
nsqlookupd-2 0/1 ContainerCreating 0 0s
nsqlookupd-2 0/1 Terminating 0 0s
nsqd-2 0/1 Terminating 0 2m
nsqd-2 0/1 Terminating 0 2m
nsqd-2 0/1 Terminating 0 2m
nsqlookupd-2 0/1 Terminating 0 4s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-1 1/1 Terminating 0 29s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-0 1/1 Terminating 0 1m
nsqd-2 0/1 Pending 0 0s
nsqd-2 0/1 Pending 0 0s
nsqd-2 0/1 ContainerCreating 0 0s
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 ContainerCreating 0 0s
nsqd-2 0/1 Running 0 4s
nsqlookupd-0 0/1 Running 0 4s
nsqd-2 1/1 Running 0 6s
nsqlookupd-0 1/1 Running 0 10s
nsqlookupd-0 1/1 Terminating 0 10s
nsqlookupd-0 0/1 Terminating 0 11s
nsqlookupd-0 0/1 Terminating 0 11s
nsqlookupd-0 0/1 Terminating 0 11s
nsqd-2 1/1 Terminating 0 12s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 ContainerCreating 0 0s
nsqlookupd-0 0/1 Running 0 3s
nsqlookupd-0 1/1 Running 0 10s
Typical container logs:
❯ kubectl logs nsqd-0
[nsqd] 2016/12/05 05:21:34.666963 nsqd v0.3.8 (built w/go1.6.2)
[nsqd] 2016/12/05 05:21:34.667170 ID: 794
[nsqd] 2016/12/05 05:21:34.667200 NSQ: persisting topic/channel metadata to nsqd.794.dat
[nsqd] 2016/12/05 05:21:34.669232 TCP: listening on [::]:4150
[nsqd] 2016/12/05 05:21:34.669284 HTTP: listening on [::]:4151
[nsqd] 2016/12/05 05:21:35.896901 200 GET /ping (172.17.0.1:51322) 1.511µs
[nsqd] 2016/12/05 05:21:40.290550 200 GET /ping (172.17.0.1:51392) 2.167µs
[nsqd] 2016/12/05 05:21:40.304599 200 GET /ping (172.17.0.1:51394) 1.856µs
[nsqd] 2016/12/05 05:21:50.289018 200 GET /ping (172.17.0.1:51452) 1.865µs
[nsqd] 2016/12/05 05:21:50.299567 200 GET /ping (172.17.0.1:51454) 1.951µs
[nsqd] 2016/12/05 05:22:00.296685 200 GET /ping (172.17.0.1:51548) 2.029µs
[nsqd] 2016/12/05 05:22:00.300842 200 GET /ping (172.17.0.1:51550) 1.464µs
[nsqd] 2016/12/05 05:22:10.295596 200 GET /ping (172.17.0.1:51698) 2.206µs
I'm totally scratching my head here on why Kubernetes keeps killing these pods. The containers themselves don't seem to be misbehaving and kubernetes itself seems to be terminating things here...
Figured it out.
My services all have the same selector. Each service matches all pods, causing Kubernetes to think it's got too many of each running at once, so it's killing the "extras" at random.