Search code examples
amazon-web-serviceshivemapreducetezbigdata

Tez execution engine vs Mapreduce Execution Engine in Hive


What is the difference between Tez engine and Map Reduce engine in Hive and in which process which engine is better to use (for eg:joins, aggregation?)


Solution

  • Tez is a DAG (Directed acyclic graph) architecture. A typical Map reduce job has following steps:

    1. Read data from file -->one disk access

    2. Run mappers

    3. Write map output --> second disk access

    4. Run shuffle and sort --> read map output, third disk access

    5. write shuffle and sort --> write sorted data for reducers --> fourth disk access

    6. Run reducers which reads sorted data --> fifth disk output

    7. Write reducers output -->sixth disk access

    Tez works very similar to Spark (Tez was created by Hortonworks well before Spark):

    1. Execute the plan but no need to read data from disk.

    2. Once ready to do some calculations (similar to actions in spark), get the data from disk and perform all steps and produce output.

    Only one read and one write.

    The efficiency is increased by not going to disk multiple times. Intermediate results are stored in memory (not written to disks)