Search code examples
dockerdocker-swarmifconfigmtu

Docker change mtu on the fly?


I have production docker swarm with 9 stacks, most of them have volumes. Currently docker is running with single node.

I have to add second node and it is the place where problems started. Especially the problem is with portainer - it becomes very laggy, almost unusable. Also when I move some containers to a new node, my project seems to be completely stucked - the problem is in communication between containers on different nodes. Some requests are ok, but most of them seems to be broken.

After some research I found out that problem seems to be with MTU. MTU of eth1 is 1450 and the docker's default is 1500.

The question is, is it possible to change docker's MTU to 1450 on the fly?

I tried:

  1. Add key to dockerd --mtu=1450 - the docker service didn't start at all
  2. Change main network's mtu in docker-compose - seems that it didn't updated, I think that the network should be recreated
  3. Add mtu option to /etc/docker/daemon.json also seemed to be with no effect

How to change mtu on working production server? Possible downtime for 10-15 minutes is ok, but I don't want to remove all stacks and recreate them.

ifconfig


Solution

  • Solved. The long way...

    I had vps named "m1" with my docker stacks and finally yesterday I realized that I can not update mtu on working cluster ((

    So I added "m2" (as manager) and "m3" (as worker) and created docker swarm cluster on "m2"+"m3" (not "m1").

    1. I modified /lib/systemd/system/docker.service on "m2" and added --mtu=1450 here:
    ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --mtu=1450
    

    I also created new file /etc/docker/daemon.json with { "mtu": 1450 } (thanks @BMitch)

    1. I removed ingress network on "m2" and recreated it with option "com.docker.network.driver.mtu": "1450"

    2. I added mtu to all my overlay networks in project (in docker-compose):

    networks:
      network1:
        driver: overlay
        driver_opts:
          com.docker.network.driver.mtu: 1450
    

    Update: here is the purpose of "m2"+"m3" - I was able check that the problem is solved at new cluster, while my production "m1" was working. I tried portainer and it was working without any lags I've seen before. Next time I will just remove stacks on laggy vps, update settings (mtu) and redeploy stacks - this would be much more faster!

    1. I removed stacks from "m1" and copied volumes to "m2" (thanks to How to copy docker volume from one machine to another?)

    2. I deployed services to "m2" and updated my domain's DNS to "m2" ip

    It works fine with no lags after MTU update!