hadoop-streaming mrjob hadoop-partitioning totalorderpartitioner

TotalOrderPartitioner and mrjob

How does one specify the TotalOrderPartitioner when using mrjob? Is this the default, or must it be specified explicitly? I've seen inconsistent behavior on different data sets.

Solution

You can specify it with job.setPartitionerClass(TotalOrderPartitioner.class);

It is not the default partitioner class. The default is the HashPartitioner class.

It's not a very easy partitioning system to use. You must use an InputSampler to pre-sample data from your input when using the TotalOrderPartitioner.

I wrote a very detailed tutorial with examples and illustrations (from beginner to advanced usage) on how to use these here.