How does one specify the TotalOrderPartitioner when using mrjob? Is this the default, or must it be specified explicitly? I've seen inconsistent behavior on different data sets.
You can specify it with job.setPartitionerClass(TotalOrderPartitioner.class);
It is not the default partitioner class. The default is the HashPartitioner
class.
It's not a very easy partitioning system to use. You must use an InputSampler to pre-sample data from your input when using the TotalOrderPartitioner.
I wrote a very detailed tutorial with examples and illustrations (from beginner to advanced usage) on how to use these here.