Search code examples
dockerdigital-oceansystemdcoreos

All containers inside a digital ocean droplet stop


I have a Digital Ocean droplet where I run 4 containers with one small python application each.

From time to time (once in a week or two), all the containers just stop working. It's not caused by the python apps inside of them.

I've made a systemd timer that executes a bash script every 30 min to check if containers are running, and if not, starts them. The timer was working for days, and it never had to restart a container.

But, one day I ssh to my droplet and see that the containers are stopped -- and systemctl list-timers --all shows me that the timer disappeared from system timers! It's just not there anymore!

The container-checking script was writing logs, and the logs stop at the same time when the containers were stopped.

Questions:

  1. How do I figure out what stops my containers?

  2. How is it possible that the systems timer just disappeared?

  3. How do I fix this?

I am the only one who can ssh to that droplet, so someone else couldn't mess it up.


Solution

  • CoreOS clusters reboot themselves when new versions of the operating system become available. That means if you're starting a process on a CoreOS machine manually, at some point it might disappear.

    The good news is, there is a standard way to run processes on CoreOS that will come back up when the machine does - that is, you can use systemd units. CoreOS describes what units are, and how to use them here: https://coreos.com/docs/launching-containers/launching/getting-started-with-systemd/

    Briefly, you can create your own units in three steps:

    Putting a file with a special format in /etc/systemd/system - the simplest one is probably something like

    [Unit]
    Description=MyApp
    After=docker.service
    Requires=docker.service
    
    [Service]
    ExecStart=/usr/bin/docker run mycontainer
    
    [Install]
    WantedBy=multi-user.target
    

    Then, you'll want to set up your system so that it will read that file (and run your container) with

    $ sudo systemctl enable foo.service
    $ sudo systemctl start hello.service
    

    The document in the link has a lot more detail (I'd strongly recommend taking a look at it before going ahead - it's short!)