I followed Saied Kazemi's instructions on docker suspend and resume using criu and used https://github.com/boucher/docker/tree/cr-defunct (based on feedback from Ross Boucher) to buid 1.10.0-dev from source to get checkpoint/restore functionality.
I am now trying to work with docker-proxy (github.com/edmodo/docker-proxy) which in-turn relies on go-dockerclient (github.com/fsouza/go-dockerclient) to get indications on containers being created, etc.
My question is more specific to the underlying triggers that the docker daemon is sending to the go-dockerclient. When containers get created, started or stopped, the appropriate indicators are being received.
However, when I use restore, I am not seeing what I had hoped to see. Perhaps I don't fully comprehend how restore works. I ran the docker daemon in debug mode to see what was happening
I first checkpoint a running container a1
as
docker checkpoint --image-dir=/tmp/ABC --leave_running a1
The corresponding debug at the daemon was
DEBU[0036] Calling POST /v1.22/containers/a1/checkpoint
DEBU[0036] POST /v1.22/containers/a1/checkpoint
DEBU[0036] form data {"ImagesDirectory":"/tmp/ABC","LeaveRunning":true,"WorkDirectory":""}
DEBU[0036] Using CRIU 20000 at: criu
DEBU[0036] Using CRIU with following args: [swrk 3]
DEBU[0036] Using CRIU in DUMP mode
DEBU[0036] CRIU option ImagesDirFd with value 22
<snip> .... I can paste this as well if needed
DEBU[0036] CRIU option EmptyNs with value 1073741824
Then, I create a new container a2
as
docker create --name=a2 alpine-sshd
The corresponding debug log for the create at the daemon was:
DEBU[0051] Calling POST /v1.22/containers/create
DEBU[0051] POST /v1.22/containers/create?name=a2
DEBU[0051] form data:{"AttachStderr":true,"AttachStdin":false,"AttachStdout":true,"Cmd":null,"Domainname":"","Entrypoint":null,"Env":[],"HostConfig":{"Binds":null,"BlkioDeviceReadBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceWriteIOps":null,"BlkioWeight":0,"BlkioWeightDevice":null,"CapAdd":null,"CapDrop":null,"CgroupParent":"","ConsoleSize":[0,0],"ContainerIDFile":"","CpuPeriod":0,"CpuQuota":0,"CpuShares":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"","Isolation":"","KernelMemory":0,"Links":null,"LogConfig":{"Config":{},"Type":""},"Memory":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":-1,"NetworkMode":"default","OomKillDisable":false,"OomScoreAdj":0,"PidMode":"","PortBindings":{},"Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"RestartPolicy":{"MaximumRetryCount":0,"Name":"no"},"SecurityOpt":null,"ShmSize":null,"UTSMode":"","Ulimits":null,"VolumeDriver":"","VolumesFrom":null},"Hostname":"","Image":"alpine-sshd","Labels":{},"OnBuild":null,"OpenStdin":false,"StdinOnce":false,"StopSignal":"SIGTERM","Tty":false,"User":"","Volumes":{},"WorkingDir":""}
ERRO[0051] Couldn't run auplink before unmount: exec: "auplink": executable file not found in $PATH
DEBU[0051] container mounted via layerStore: /var/lib/docker/0.0/aufs/mnt/a02ad092a4ae9d0ae40f26a8457fe8379e63a8362444aedb6d41c67d34b2cb83
ERRO[0051] Couldn't run auplink before unmount: exec: "auplink": executable file not found in $PATH
At this point of time, the a2
container is created, but not running. This creation causes an indication to the dockerclient that a container has been created, but not running. docker ps -a
and docker ps
reveal two (a1
and a2
) and one (a1
) containers respectively; as expected.
After that, I restore a2
with the checkpointed image using
docker restore --force=true --image-dir=/tmp/ABC a2
The corresponding debug for restore was:
DEBU[0083] Calling POST /v1.22/containers/a2/restore
DEBU[0083] POST /v1.22/containers/a2/restore?force=1
DEBU[0083] form data {"ImagesDirectory":"/tmp/ABC","LeaveRunning":false,"WorkDirectory":""}
DEBU[0083] container mounted via layerStore: /var/lib/docker/0.0/aufs/mnt/a02ad092a4ae9d0ae40f26a8457fe8379e63a8362444aedb6d41c67d34b2cb83
DEBU[0083] Assigning addresses for endpoint a2's interface on network bridge
DEBU[0083] RequestAddress(LocalDefault/172.17.0.0/16, <nil>, map[])
DEBU[0083] Assigning addresses for endpoint a2's interface on network bridge
INFO[0083] No non-localhost DNS nameservers are left in resolv.conf. Using default external servers : [nameserver 8.8.8.8 nameserver 8.8.4.4]
INFO[0083] IPv6 enabled; Adding default IPv6 external servers : [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]
DEBU[0083] Using CRIU 20000 at: criu
DEBU[0083] Using CRIU with following args: [swrk 3]
DEBU[0083] Using CRIU in RESTORE mode
DEBU[0083] CRIU option ImagesDirFd with value 29
<snip>.... I can paste this if needed
DEBU[0083] CRIU option EmptyNs with value 1073741824
This starts up the container. However no kind of indicator is seen via the daemon to the dockerclient. Both containers work normally.
Is this lack of indication by design ? Is there some other method in which one could get a trigger of a container having started ? I have to dig deeper into go-dockerclient to see if I am missing something there
Any help will be much appreciated. Thanks in advance
This branch represents the latest working version of docker with checkpoint restore: https://github.com/boucher/docker/tree/cr-defunct
There's also a precompiled version: https://github.com/boucher/docker/releases/tag/v1.10_2-16-16-experimental
I believe that, although the "start" event won't fire, a "restore" event should be fired by the daemon.