Search code examples
kubernetesdocker-ucpdocker-ee

Docker UCP controller down with error: Unhealthy UCP manager: unable to reach manager: connect: connection refused


I have setup a test environment with docker UCP , after some days , one of the controller randomly went down with message in UCP that the host is down and the cluster is not healthy.

Logs of controller container:

{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:10Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context

Can be a random network connectivity problem? but it should have been restored automatically ?


Solution

  • After examining the docker daemon on the host , I saw that the system went into this issue:

    https://github.com/docker/for-linux/issues/162