Search code examples
scalascalding

Scalding: How to change default tuple comparison function?


Doing Scalding MapReduce operations I need to compare tuples using my own comparison function on tuple fields.

Questions:

  1. How to define my own tuple comparison function?
  2. What are the rules to extend Scalding with custome Scala code in general? Limitations?

Thanks!


Solution

  • You can create virtual field (e.g. by using com.twitter.scalding.RichPipe#map), sort by this field and then take it away. Here is an example based on the Scalding Documentation:

    val users = Csv(file_source, separator = ",", fields = Schema)
      .read
      .map ('age-> 'ageInt) {x:Int => x}
      .groupAll { _.sortBy('ageInt) }  // will sort age as a number.
      .discard ('ageInt)