Search code examples
hadoop-streamingmrjobhadoop-partitioningtotalorderpartitioner

TotalOrderPartitioner and mrjob


How does one specify the TotalOrderPartitioner when using mrjob? Is this the default, or must it be specified explicitly? I've seen inconsistent behavior on different data sets.


Solution

  • You can specify it with job.setPartitionerClass(TotalOrderPartitioner.class);

    It is not the default partitioner class. The default is the HashPartitioner class.

    It's not a very easy partitioning system to use. You must use an InputSampler to pre-sample data from your input when using the TotalOrderPartitioner.

    I wrote a very detailed tutorial with examples and illustrations (from beginner to advanced usage) on how to use these here.