Search code examples
dockerupgrademesosmesospheredcos

DCOS Upgrade docker version on agent nodes


We are running DC/OS 11.1 on Azure cloud and have Docker engine version 17.09 on our agent nodes. We would like to upgrade Docker engine to 17.12.1 on each agent node.

Has anyone had experience with such procedure and would it cause any instability / side effects with the rest of the DC/OS components?


Solution

  • I have not done the upgrade myself in the exact environment you are running in, but I would not be terribly concerned. It goes without saying that test this out in non-production environment before you do it in production.

    I would suggest draining the agent node before doing the docker upgrade. What I mean by draining is that you stop all the containers(tasks) running on the node, this will ensure that Mesos agents will stop the tasks and then inform the framework that the tasks are no longer running and the frameworks would take appropriate action.

    To drain nodes run

    sudo sh -c 'systemctl kill -s SIGUSR1 dcos-mesos-slave && systemctl stop dcos-mesos-slave'
    

    for a private agent

    sudo sh -c 'systemctl kill -s SIGUSR1 dcos-mesos-slave-public && systemctl stop dcos-mesos-slave-public'
    

    for public agent

    You would observe the agent disappear from the Nodes section of the UI and all tasks running on the agent marked as TASK_LOST. Ideally it should have been TASK_KILLED but that is a topic for another time.

    Now perform your docker upgrade

    After you have upgraded docker start the agent service back up

    sudo systemctl start dcos-mesos-slave
    

    for a private agent

    sudo systemctl start dcos-mesos-slave-public
    

    for public agent

    The nodes should now start showing up in the UI and start accepting tasks.

    To be safe

    1. Verify this in non-prod environment before you do it in prod, to iron out any operational issues you might encounter
    2. Do 1 or a subset of agents at a time so that you are not left with a cluster without any nodes while you are performing the upgrade