Search code examples
dockerrestorecheckpoint

docker restore; lack of indication to go-dockerclient - FIXED


I followed Saied Kazemi's instructions on docker suspend and resume using criu and used https://github.com/boucher/docker/tree/cr-defunct (based on feedback from Ross Boucher) to buid 1.10.0-dev from source to get checkpoint/restore functionality.

I am now trying to work with docker-proxy (github.com/edmodo/docker-proxy) which in-turn relies on go-dockerclient (github.com/fsouza/go-dockerclient) to get indications on containers being created, etc.

My question is more specific to the underlying triggers that the docker daemon is sending to the go-dockerclient. When containers get created, started or stopped, the appropriate indicators are being received.

However, when I use restore, I am not seeing what I had hoped to see. Perhaps I don't fully comprehend how restore works. I ran the docker daemon in debug mode to see what was happening

I first checkpoint a running container a1 as

docker checkpoint --image-dir=/tmp/ABC --leave_running a1

The corresponding debug at the daemon was

DEBU[0036] Calling POST /v1.22/containers/a1/checkpoint 
DEBU[0036] POST /v1.22/containers/a1/checkpoint         
DEBU[0036] form data {"ImagesDirectory":"/tmp/ABC","LeaveRunning":true,"WorkDirectory":""} 
DEBU[0036] Using CRIU 20000 at: criu                    
DEBU[0036] Using CRIU with following args: [swrk 3]     
DEBU[0036] Using CRIU in DUMP mode                      
DEBU[0036] CRIU option ImagesDirFd with value 22
<snip> .... I can paste this as well if needed        
DEBU[0036] CRIU option EmptyNs with value 1073741824  

Then, I create a new container a2 as

docker create --name=a2 alpine-sshd

The corresponding debug log for the create at the daemon was:

DEBU[0051] Calling POST /v1.22/containers/create        
DEBU[0051] POST /v1.22/containers/create?name=a2        
DEBU[0051] form data:{"AttachStderr":true,"AttachStdin":false,"AttachStdout":true,"Cmd":null,"Domainname":"","Entrypoint":null,"Env":[],"HostConfig":{"Binds":null,"BlkioDeviceReadBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceWriteIOps":null,"BlkioWeight":0,"BlkioWeightDevice":null,"CapAdd":null,"CapDrop":null,"CgroupParent":"","ConsoleSize":[0,0],"ContainerIDFile":"","CpuPeriod":0,"CpuQuota":0,"CpuShares":0,"CpusetCpus":"","CpusetMems":"","Devices":[],"Dns":[],"DnsOptions":[],"DnsSearch":[],"ExtraHosts":null,"GroupAdd":null,"IpcMode":"","Isolation":"","KernelMemory":0,"Links":null,"LogConfig":{"Config":{},"Type":""},"Memory":0,"MemoryReservation":0,"MemorySwap":0,"MemorySwappiness":-1,"NetworkMode":"default","OomKillDisable":false,"OomScoreAdj":0,"PidMode":"","PortBindings":{},"Privileged":false,"PublishAllPorts":false,"ReadonlyRootfs":false,"RestartPolicy":{"MaximumRetryCount":0,"Name":"no"},"SecurityOpt":null,"ShmSize":null,"UTSMode":"","Ulimits":null,"VolumeDriver":"","VolumesFrom":null},"Hostname":"","Image":"alpine-sshd","Labels":{},"OnBuild":null,"OpenStdin":false,"StdinOnce":false,"StopSignal":"SIGTERM","Tty":false,"User":"","Volumes":{},"WorkingDir":""} 
ERRO[0051] Couldn't run auplink before unmount: exec: "auplink": executable file not found in $PATH 
DEBU[0051] container mounted via layerStore: /var/lib/docker/0.0/aufs/mnt/a02ad092a4ae9d0ae40f26a8457fe8379e63a8362444aedb6d41c67d34b2cb83 
ERRO[0051] Couldn't run auplink before unmount: exec: "auplink": executable file not found in $PATH 

At this point of time, the a2 container is created, but not running. This creation causes an indication to the dockerclient that a container has been created, but not running. docker ps -a and docker ps reveal two (a1 and a2) and one (a1) containers respectively; as expected.

After that, I restore a2 with the checkpointed image using

docker restore --force=true --image-dir=/tmp/ABC a2

The corresponding debug for restore was:

DEBU[0083] Calling POST /v1.22/containers/a2/restore    
DEBU[0083] POST /v1.22/containers/a2/restore?force=1    
DEBU[0083] form data {"ImagesDirectory":"/tmp/ABC","LeaveRunning":false,"WorkDirectory":""} 
DEBU[0083] container mounted via layerStore: /var/lib/docker/0.0/aufs/mnt/a02ad092a4ae9d0ae40f26a8457fe8379e63a8362444aedb6d41c67d34b2cb83 
DEBU[0083] Assigning addresses for endpoint a2's interface on network bridge 
DEBU[0083] RequestAddress(LocalDefault/172.17.0.0/16, <nil>, map[]) 
DEBU[0083] Assigning addresses for endpoint a2's interface on network bridge 
INFO[0083] No non-localhost DNS nameservers are left in resolv.conf. Using default external servers : [nameserver 8.8.8.8 nameserver 8.8.4.4] 
INFO[0083] IPv6 enabled; Adding default IPv6 external servers : [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844] 
DEBU[0083] Using CRIU 20000 at: criu                    
DEBU[0083] Using CRIU with following args: [swrk 3]     
DEBU[0083] Using CRIU in RESTORE mode                   
DEBU[0083] CRIU option ImagesDirFd with value 29        
<snip>.... I can paste this if needed
DEBU[0083] CRIU option EmptyNs with value 1073741824   

This starts up the container. However no kind of indicator is seen via the daemon to the dockerclient. Both containers work normally.

Is this lack of indication by design ? Is there some other method in which one could get a trigger of a container having started ? I have to dig deeper into go-dockerclient to see if I am missing something there

Any help will be much appreciated. Thanks in advance


Solution

  • This branch represents the latest working version of docker with checkpoint restore: https://github.com/boucher/docker/tree/cr-defunct

    There's also a precompiled version: https://github.com/boucher/docker/releases/tag/v1.10_2-16-16-experimental

    I believe that, although the "start" event won't fire, a "restore" event should be fired by the daemon.