Search code examples
hiveapache-tez

Is there any scenario where we wouldn't want to reuse tez containers?


I started with hive and tez some days back during one of my projects. During that time, I came across this property tez.am.container.reuse.enabled which is recommended to be kept as true by many sites. I understand it's due to :

  • Limiting requests for new containers to RM
  • Reducing the cost of container spin up and hence add to time savings

But I can't think of any scenario where we would want this property to be disabled. I have been searching online for any such cases but I'm not able to find any.

Can anyone help me with this?


Solution

  • In terms of performance, there is no reason not to re-use the containers, Execution Efficiency section of this paper explains very well, and this is why the default value for this parameter is true.

    But, I think there are some cases which might explain why this feature is still configurable;

    • You may want to disable it for workaround purpose. For example, this hive ticket is still unresolved and when tez.am.container.reuse.enabled=false the problematic query works fine. If my production case is critical, instead of being completely blocked, I may prefer running my jobs without re-using the containers.
    • The property may conflict with some other properties, and based on your priority, you may wanna give up on performance. For example in Configure Tez Container Reuse doc, there is a warning which says;

    Do not use the tez.queue.name configuration parameter because it sets all Tez jobs to run on one particular queue.

    • As a last item, I saw another warning on this doc;

    Enabling this parameter improves performance by avoiding the memory overhead of reallocating container resources for every task. However, disable this parameter if the tasks contain memory leaks or use static variables.