My system crashed just like that yesterday and I have not been able to recover it. I have not set this up so I don't know all the details but please ask for any details needed. It used to be that on any VM restart the system would start working automatically but after this crash it just doesn't anymore:
Here is my docker info
output
Containers: 168
Running: 0
Paused: 0
Stopped: 168
Images: 241
Server Version: 1.12.2
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 1228
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: overlay bridge host null
Swarm: pending
NodeID: bg8sh8m6zm5llezlmcw00nqx6
Is Manager: true
ClusterID: 1wfvx3ze7tm1bb56a5zyk9xqs
Managers: 1
Nodes: 2
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: ADDRESS //hidden for security reasons
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-91-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.804 GiB
Name: swarm-manager-1
ID: AXPO:VFSV:TDT3:6X7Y:QNAO:OZJN:U23R:V5S2:FU33:WUNI:CRPK:2E2C
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
My docker node ls output:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
bg8sh8m6zm5llezlmcw00nqx6 * swarm-manager-1 Ready Active Leader
c21j8nzzy3151vv06m54yyd1u swarm-worker-1 Unknown Active
Here is the output from docker node inspect swarm-manager-1
[
{
"ID": "bg8sh8m6zm5llezlmcw00nqx6",
"Version": {
"Index": 67823
},
"CreatedAt": "2016-10-13T23:46:00.580142516Z",
"UpdatedAt": "2017-08-29T19:48:35.4197366Z",
"Spec": {
"Role": "manager",
"Availability": "active"
},
"Description": {
"Hostname": "swarm-manager-1",
"Platform": {
"Architecture": "x86_64",
"OS": "linux"
},
"Resources": {
"NanoCPUs": 2000000000,
"MemoryBytes": 7305609216
},
"Engine": {
"EngineVersion": "1.12.2",
"Plugins": [
{
"Type": "Network",
"Name": "bridge"
},
{
"Type": "Network",
"Name": "host"
},
{
"Type": "Network",
"Name": "null"
},
{
"Type": "Network",
"Name": "overlay"
},
{
"Type": "Volume",
"Name": "local"
}
]
}
},
"Status": {
"State": "ready"
},
"ManagerStatus": {
"Leader": true,
"Reachability": "reachable",
"Addr": "ADDRESS" //hidden
}
}
]
Here is the output from docker node inspect swarm-worker-1
[
{
"ID": "c21j8nzzy3151vv06m54yyd1u",
"Version": {
"Index": 67824
},
"CreatedAt": "2017-02-21T05:42:31.467777741Z",
"UpdatedAt": "2017-08-29T19:48:35.4252027Z",
"Spec": {
"Role": "worker",
"Availability": "active"
},
"Description": {
"Hostname": "swarm-worker-1",
"Platform": {
"Architecture": "x86_64",
"OS": "linux"
},
"Resources": {
"NanoCPUs": 2000000000,
"MemoryBytes": 7305609216
},
"Engine": {
"EngineVersion": "1.12.2",
"Plugins": [
{
"Type": "Network",
"Name": "bridge"
},
{
"Type": "Network",
"Name": "host"
},
{
"Type": "Network",
"Name": "null"
},
{
"Type": "Network",
"Name": "overlay"
},
{
"Type": "Volume",
"Name": "local"
}
]
}
},
"Status": {
"State": "unknown",
"Message": "Node moved to \"unknown\" state due to leadership change in cluster"
}
}
]
Any ideas how to get it back to work?
On the swarm worker execute docker swarm leave
, then on the master run docker swarm join-token worker
and execute the resulting token command back on worker. It should start working.
The crash may have caused an issue. This corrupts the state of swarm and hence you need to recreate it