Search code examples
scalascalding

Scala/Scalding: Pivoting data


I have a dataset which is the output of a pipe in scalding that looks like this:

'Var1, 'Var2, 'Var3, 'Var4 =
 a,x,1,2
 a,y,3,4
 b,x,1,2
 b,y,3,4

I'm trying to turn it into something like:

'Var1, 'Var3x, 'Var4x, 'Var3y, 'Var4y =
a,1,2,3,4
b,1,2,3,4

First I thought using flatMap somehow would work but that didn't seem right. Seems like some use of pivot function should work, but I can't quite work out how to pivot multiple columns.

Any help is appreciated.


Solution

  • You need to combine your two value columns into one, and then you can use .pivot. Something like this:

    case class v34(v3: Int, v4: Int) 
    pipe
        .map(('Var3, 'Var4) -> ('V34)) { vars: (Int, Int) => v34(vars._1, vars._2) }
        .groupBy('Var1) { _.pivot(('Var2, 'V34) => ('x, 'y)) }
        .mapTo(('Var1, 'x, 'y) -> ('Var1, 'Var3x, 'Var4x, 'Var3y, 'Var4y) { 
           vars: (String,V34,V34) =>
           val (key, xval, yval) = vars
           (key, xval.v3, xval.v4, yval.v3, yval.v4)
        }
    
        .