Search code examples
apachecontainerstez

How container reuse works in Apache Tez? While reusing what is the data stored in shared location?


While Apache tez reuses containers,what is the process takes place. Can anyone explain me clearly?


Solution

  • Please read Hortonworks (most significant Tez contributor) docs here: https://hortonworks.com/blog/re-using-containers-in-apache-tez/

    Each vertex in Tez specifies parameters, which are used when launching containers. These include the requested resources (memory, CPU etc), YARN LocalResources, the environment, and the command line options for tasks belonging to this Vertex. When a container is first launched, it is launched for a specific task and uses the parameters specified for the task (or vertex) – this then becomes the container’s signature. An already running container is considered to be compatible for another task when the running container’s signature is a superset of what the task requires.

    The Tez scheduler works with several parameters to take decisions on task assignments – task-locality requirements, compatibility of containers as described above, total available resources on the cluster, and the priority of pending task requests.

    When a task completes, and the container running the task becomes available for re-use – a task may not be assigned to it immediately – as tasks may not exist, for which the data is local to the container’s node. The Tez scheduler first makes an attempt to find a task for which the data would be local for the container. If no such task exists, the scheduler holds on to the container for a specific time, before actually allocating any pending tasks to this container.

    Each Tez JVM (or container) contains an object cache, which can be used to share data between different tasks running within the same container. This is a simple Key-Object store, with different levels of visibility/retention. Objects can be cached for use within tasks belonging to the same Vertex, for all tasks within a DAG, and for tasks running across a Tez Session.