Search code examples
hadoopmapreducehiveapache-teztez

When to use Hive engine MR and when to use TEZ?


Under what conditions is it preferable to use the Hive engine TEZ over MR?

What are the pro's and con's of each?


Solution

  • TEZ does the same as MR does only faster. The more complex the query is the more benefit from TEZ. So TEZ is always preferable when it works.

    Tez generalizes the MapReduce paradigm to a more powerful framework by providing the ability to execute a complex DAG (directed acyclic graph) of tasks for a single job. When the plan is implemented via map-reduce primitives, there are an inevitable number of job boundaries which introduce overheads of read/write to durable storage and job startup, and which may miss out on easy optimization opportunities such as worker node reuse and warm caches.

    Of course there are some bugs not resolved yet in your TEZ version - this is the only problem you may face implementing some particular solution on TEZ.

    Though MR is more mature but Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions.

    Read also this:

    Difference between MR and Tez

    and this:

    Introducing Tez