I want to run an initialisation script on each node, and I want to run them only once.
Here I have the yaml to do some basic initialisation on each node, but once the initialisation script finish execution, the pod exits with exit code: 0
and the daemonset restarts the pod, running the initialisation script again and again.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: test-init-node-cr
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: test-init-node-sa
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: test-init-node-cr
subjects:
- kind: ServiceAccount
name: test-init-node-sa
namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: test-init-node-sa
namespace: default
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: test-init-node
namespace: default
spec:
selector:
matchLabels:
app.kubernetes.io/name: test-init-node
app.kubernetes.io/component: configurator
# replicas: 3
template:
metadata:
name: test-init-node
labels:
app.kubernetes.io/name: test-init-node
app.kubernetes.io/component: configurator
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: k8s.amazee.io/node-configured
operator: DoesNotExist
hostPID: true
hostNetwork: true
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
serviceAccount: test-init-node-sa
containers:
- name: init
env:
- name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
command:
- nsenter
- --mount=/proc/1/ns/mnt
- --
- bash
- -xc
- |
echo "starting the magic"
echo "* hard core unlimited" >> /etc/security/limits.d/game.conf
echo "* soft core unlimited" >> /etc/security/limits.d/game.conf
image: alpine/k8s:1.28.0
resources:
requests:
cpu: 50m
memory: 50M
securityContext:
runAsUser: 0
privileged: true
Is there any way for me to prevent the daemonset from restarting the pod if the pod exits? i.e. ensuring that the initialisation only happen once per node.
I tried adding a preStop but it does not seem to have any effect.
The idea is that if k8s.amazee.io/node-configured
is set then the daemonset will not schedule onto that node.
preStop:
exec:
command:
- /bin/sh"
- -c
- kubectl label node "$MY_NODE_NAME" k8s.amazee.io/node-configured=$(date +%s)
neither does adding a semicolon (well this is expected but I thought why not give it a try)
command:
- nsenter
- --mount=/proc/1/ns/mnt
- --
- bash
- -xc
- |
echo "starting the magic"
echo "* hard core unlimited" >> /etc/security/limits.d/game.conf
echo "* soft core unlimited" >> /etc/security/limits.d/game.conf
- ;
- /bin/sh"
- -c
- kubectl label node "$MY_NODE_NAME" k8s.amazee.io/node-configured=$(date +%s)
Is there any way for me to prevent the daemonset from restarting the pod if the pod exits properly? i.e. ensuring that the initialisation only happen once per node.
Running the daemonset pod works, but it will still take up some resource and does not feel elegant.
from @norbjd's answer, I saw this in the GCP tutorial.
initContainers:
- image: ubuntu:18.04
name: node-initializer
command: ["/scripts/entrypoint.sh"]
env:
- name: ROOT_MOUNT_DIR
value: /root
securityContext:
privileged: true
containers:
- image: "gcr.io/google-containers/pause:2.0"
name: pause
The tutorial is talking about using the pause contain from google-containers to avoid a restart of the pod. However, what caught my eye was the initContainers
.
Init containers are exactly like regular containers, except:
Init containers always run to completion.
Each init container must complete successfully before the next one starts.
This gave me an idea. What if I ran 2 containers, the first being an initContainers
that does all the initialisations, and the second containers
will run a command to add a label to prevent scheduling, hence stopping any further pod creation/restart of the Daemonset for that particular node.
Of course, by the same logic, both can be initContainers
, but in my case I used 1 initContainers
and 1 containers
, since containers
will wait for all initContainers
to be completed so they have the same result as both initContainers
.
working example
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: test-init-node-cr
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: test-init-node-sa
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: test-init-node-cr
subjects:
- kind: ServiceAccount
name: test-init-node-sa
namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: test-init-node-sa
namespace: default
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: test-init-node
namespace: default
spec:
selector:
matchLabels:
app.kubernetes.io/name: test-init-node
app.kubernetes.io/component: configurator
# replicas: 3
template:
metadata:
name: test-init-node
labels:
app.kubernetes.io/name: test-init-node
app.kubernetes.io/component: configurator
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test-init-node-date
operator: DoesNotExist
hostPID: true
hostNetwork: true
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
serviceAccount: test-init-node-sa
initContainers:
- name: init
command:
- nsenter
- --mount=/proc/1/ns/mnt
- --
- bash
- -xc
- |
echo "starting the magic"
echo "* hard core unlimited" >> /etc/security/limits.d/game.conf
echo "* soft core unlimited" >> /etc/security/limits.d/game.conf
echo "user00 soft core unlimited" >> /etc/security/limits.d/game.conf
image: alpine/k8s:1.28.0
resources:
requests:
cpu: 50m
memory: 50M
securityContext:
runAsUser: 0
privileged: true
containers:
- name: add-label-to-remove-scheduling
env:
- name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
command:
- sh
- -c
- |
kubectl label node "$MY_NODE_NAME" test-init-node-date=$(date +%s)
image: alpine/k8s:1.28.0
resources:
requests:
cpu: 50m
memory: 50M
securityContext:
runAsUser: 0
privileged: true
Rough Explanation:
test-init-node-date
label is set. If it is, skip and
do nothinginitContainers
and run the init
script as neededContainers
and add test-init-node-date
labelsample labels:
kubernetes.io/os=linux
node.kubernetes.io/instance-type=n2d-standard-8
test-init-node-date=1693280374
This will then create a daemonset that runs the init pod once, start another pod to add test-init-node-date
label. Since the test-init-node-date
label is set, no new pods will be scheduled by the daemonset.
And finally, to quote norbjd, to prevent accidental re-run of the init script(e.g. someone deleted the label), you can add a safeguard check before you run the script.
if [ ! -f /etc/game-conf-limits-updated ]
then
echo "starting the magic"
echo "* hard core unlimited" >> /etc/security/limits.d/game.conf
echo "* soft core unlimited" >> /etc/security/limits.d/game.conf
touch /etc/game-conf-limits-updated
fi