Search code examples
dockerdistributed-computingmesosapache-aurora

Launching jobs with large docker images in mesos via aurora can be slow


When launching a task over mesos via aurora that uses a rather large docker image (~2GB) there is a long wait time before the task actually starts.

Even when the task has been previously launched and we would expect the docker image to already be available to the worker node, there is still a waiting time dependent on image size before the task actually launches. Using docker, you can launch a container almost instantly as long as it is in your images list already, does the mesos containerizer not support this "caching" as well ? Is this functionality something that can be configured ?

I haven't tried using the docker containerizer, but it is my understanding that it will be phased out soon anyway and that gpu resource isolation, which we require, only works for the mesos containerizer.


Solution

  • I am assuming you are talking about the unified containerizer running docker images? What is backend you are using? By default the Mesos agents use the copy backend which is why you are seeing it being slow. You can look at the backend the agent is using by hitting flags endpoint on the agent. Switch the backend to aufs or overlayfs to see if speeds up the launch. You can specify the backend through the flag --image_provisioner_backend=VALUE on the agent.

    NOTE: There are few bugs fixes related to aufs and overlayfs backend in the latest Mesos release 1.2.0-rc1 that you might want to pick up. Not to mention that there is an autobackend feature in 1.2.0-rc1 that will automatically select the fastest backend available.