Search code examples
postgresqlkubernetesk3s

how to safely restart postgresql container in kubernetes?


I have a small hobby website I want to release on my server. I chose to use kubernetes for that, as I partly use it at work as well, so I'd like to get familiar. I bought hetzner debian server and installed k3s. Now are deploying PostgreSQL container (version 15.2, not cluster) as per this tutorial. (I did a minor changes I saw from other tutorials, should not be relevant).

It runs fine, I was happy with it. But I tried to restart deployment to make sure data are not lost if server goes down for some reason. After a few restarts, database is corrupted.

Once I saw:

PANIC:  invalid magic number 0000 in log segment 000000010000000000000000, offset 0

another time:

invalid contrecord length 1174 (expected 48430224) at 0/195BC90

another time:

PANIC:  could not locate a valid checkpoint record

When I tried to google how to recover from this, I did not find any safe options and mostly the suggestions were to restore backup.

So my question is, how do I safely restart/shutdown PostgreSQL container? Am I missing some shutdown config for PostgreSQL pod in k8s?

Update 1:

I was restarting deployment from k9s with r command. I think UI made it look like it was rotated right away, but it probably takes some time. So I think I triggered multiple restarts every 10 seconds and that might have corrupted the DB. Anyway I added terminationGracePeriodSeconds: 60 and used preStop hook from answer. Thanks

Update 2: I imported DB, did a restart and again same issue:

could not locate a valid checkpoint record

Update 3: I replaced Deployment with StatefulSet and it seems to be handling restarts better. Tried over 10 restarts and no issues. Whereas before it crashed around 4th restart.


Solution

  • Of course the best practice is using an operator like cloudnative-pg or postgres-operator but they are pretty big and probably have way more feature for a simple workload. Here is a simple solution for your problem.

    Solution

    Add below to your pod spec

    preStop:
      exec:
        command: ["/usr/local/bin/pg_ctl stop -D /var/lib/postgresql/data -w -t 60 -m fast"]
    

    Explanation

    Basically when you kill a pod, Kubernetes signals SIGTERM and gives 30 seconds for your pod, after that time it sends SIGKILL. When postgres receive SIGTERM it won't accept net connections but it won't terminate existing terminations, so any client will block db's termination, and after 30 seconds pod will receive SIGKILL which is very bad for postgres doc. So you need to safely shutdown postgres somehow, with preStop hook you can.

    Kubernetes

    This is the exact chronological order of your pod:

    1. Set state=Terminating from Pod controller
    2. terminationGracePeriodSeconds timer starts (default is 30 seconds)
    3. preStop hook: pg_cli ...
    4. SIGTERM is sent: Postgres won't accept new connections
    5. k8s waits until terminationGracePeriods (configurable from yaml)
    6. If app is still alive SIGKILL is sent

    Also you need to set .spec.strategy.type==Recreate in Deployment.

    Postgres

    For the pg_cli commands you can refer this summary, most useful one for you looks like -m fast.

    SIGTERM:

    • "Smart Shutdown Mode"
    • Disallows new connections
    • Let's existing connections continue

    SIGINT:

    • "Fast Shutdown Mode"
    • Disallow new connections
    • Send SIGTERM to existing servers processes (?) they'll exit promptly

    SIGQUIT:

    • "Immediate Shutdown Mode"
    • Send SIGQUIT to all child processes, if they don't terminate in 5 secs sends SIGKILL

    EDIT:

    Apparently Recreate only guarantees recreation for update (old rs to new rs), but it does not guarantee 1 pod at a time if pod randomly dies. While new pod is creating old one may be in terminating phase, and because of race condition data may be corrupt. Relevant Doc

    This will only guarantee Pod termination previous to creation for upgrades. If you upgrade a Deployment, all Pods of the old revision will be terminated immediately. Successful removal is awaited before any Pod of the new revision is created. If you manually delete a Pod, the lifecycle is controlled by the ReplicaSet and the replacement will be created immediately (even if the old Pod is still in a Terminating state). If you need an "at most" guarantee for your Pods, you should consider using a StatefulSet.