Search code examples
dockervirtual-machinecontainersdocker-swarm

docker containers don't start automatically with docker swarm


My system crashed just like that yesterday and I have not been able to recover it. I have not set this up so I don't know all the details but please ask for any details needed. It used to be that on any VM restart the system would start working automatically but after this crash it just doesn't anymore:

Here is my docker info output

Containers: 168
 Running: 0
 Paused: 0
 Stopped: 168
Images: 241
Server Version: 1.12.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1228
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay bridge host null
Swarm: pending
 NodeID: bg8sh8m6zm5llezlmcw00nqx6
 Is Manager: true
 ClusterID: 1wfvx3ze7tm1bb56a5zyk9xqs
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: ADDRESS //hidden for security reasons
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-91-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.804 GiB
Name: swarm-manager-1
ID: AXPO:VFSV:TDT3:6X7Y:QNAO:OZJN:U23R:V5S2:FU33:WUNI:CRPK:2E2C
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

My docker node ls output:

ID                           HOSTNAME         STATUS   AVAILABILITY  MANAGER STATUS
bg8sh8m6zm5llezlmcw00nqx6 *  swarm-manager-1  Ready    Active        Leader
c21j8nzzy3151vv06m54yyd1u    swarm-worker-1   Unknown  Active  

Here is the output from docker node inspect swarm-manager-1

[
    {
        "ID": "bg8sh8m6zm5llezlmcw00nqx6",
        "Version": {
            "Index": 67823
        },
        "CreatedAt": "2016-10-13T23:46:00.580142516Z",
        "UpdatedAt": "2017-08-29T19:48:35.4197366Z",
        "Spec": {
            "Role": "manager",
            "Availability": "active"
        },
        "Description": {
            "Hostname": "swarm-manager-1",
            "Platform": {
                "Architecture": "x86_64",
                "OS": "linux"
            },
            "Resources": {
                "NanoCPUs": 2000000000,
                "MemoryBytes": 7305609216
            },
            "Engine": {
                "EngineVersion": "1.12.2",
                "Plugins": [
                    {
                        "Type": "Network",
                        "Name": "bridge"
                    },
                    {
                        "Type": "Network",
                        "Name": "host"
                    },
                    {
                        "Type": "Network",
                        "Name": "null"
                    },
                    {
                        "Type": "Network",
                        "Name": "overlay"
                    },
                    {
                        "Type": "Volume",
                        "Name": "local"
                    }
                ]
            }
        },
        "Status": {
            "State": "ready"
        },
        "ManagerStatus": {
            "Leader": true,
            "Reachability": "reachable",
            "Addr": "ADDRESS" //hidden
        }
    }
]

Here is the output from docker node inspect swarm-worker-1

[
    {
        "ID": "c21j8nzzy3151vv06m54yyd1u",
        "Version": {
            "Index": 67824
        },
        "CreatedAt": "2017-02-21T05:42:31.467777741Z",
        "UpdatedAt": "2017-08-29T19:48:35.4252027Z",
        "Spec": {
            "Role": "worker",
            "Availability": "active"
        },
        "Description": {
            "Hostname": "swarm-worker-1",
            "Platform": {
                "Architecture": "x86_64",
                "OS": "linux"
            },
            "Resources": {
                "NanoCPUs": 2000000000,
                "MemoryBytes": 7305609216
            },
            "Engine": {
                "EngineVersion": "1.12.2",
                "Plugins": [
                    {
                        "Type": "Network",
                        "Name": "bridge"
                    },
                    {
                        "Type": "Network",
                        "Name": "host"
                    },
                    {
                        "Type": "Network",
                        "Name": "null"
                    },
                    {
                        "Type": "Network",
                        "Name": "overlay"
                    },
                    {
                        "Type": "Volume",
                        "Name": "local"
                    }
                ]
            }
        },
        "Status": {
            "State": "unknown",
            "Message": "Node moved to \"unknown\" state due to leadership change in cluster"
        }
    }
]

Any ideas how to get it back to work?


Solution

  • On the swarm worker execute docker swarm leave, then on the master run docker swarm join-token worker and execute the resulting token command back on worker. It should start working.

    The crash may have caused an issue. This corrupts the state of swarm and hence you need to recreate it