Search code examples
apachehadoophadoop-partitioning

Hadoop Total order Partitioning


Why total total order partitioning in hadoop?. Which scenario we need to take total order partitioning ?. My understanding is after multiple reducers, each reducer result will be sorted by key . then why we need to do total order partitioning. Would be great if you could share any graphical rep. of examples?


Solution

  • Total order partitioning will sort the output by key across all the reducers. This allows you to combine output of multiple reducers and still get the sorted output. Simple example below:

    Without total order partitioning

    reducer 1's output: 
    (a,val_a)
    (m,val_m)
    (x,val_x)
    
    reducer 2's output: 
    (b,val_b)
    (c,val_c)
    

    If you combine, the output is not sorted by key anymore.

    (a,val_a)
    (m,val_m)
    (x,val_x)
    (b,val_b)
    (c,val_c)
    

    With total order partitioning

    reducer 1's output: 
    (a,val_a)
    (b,val_b)
    (c,val_c)
    
    reducer 2's output: 
    (m,val_m)
    (x,val_x)
    

    If you combine, the output is still sorted by key.

    (a,val_a)
    (b,val_b)
    (c,val_c)
    (m,val_m)
    (x,val_x)