Given that I have a dataframe with some columns:
Why does this not work?
val output3b = input.withColumn("sum", columnsToConcat.foldLeft(0)((x,y)=>(x+y)))
notebook:16: error: overloaded method value + with alternatives:
(x: Int)Int <and>
(x: Char)Int <and>
(x: Short)Int <and>
(x: Byte)Int
cannot be applied to (org.apache.spark.sql.Column)
val output3b = input.withColumn("sum", columnsToConcat.foldLeft(0)((x,y)=>(x+y))) // does work
^
notebook:16: error: type mismatch;
found : Int
required: org.apache.spark.sql.Column
val output3b = input.withColumn("sum", columnsToConcat.foldLeft(0)((x,y)=>(x+y)))
But this does?
val output3a = input.withColumn("concat", columnsToConcat.foldLeft(lit(0))((x,y)=>(x+y)))
Using the famous lit function seems to smooth a few things, but I am not sure why.
+---+----+----+----+----+----+------+
| ID|var1|var2|var3|var4|var5|concat|
+---+----+----+----+----+----+------+
| a| 5| 7| 9| 12| 13| 46.0|
+---+----+----+----+----+----+------+
Prerequisites:
Based on the compiler message and the API usage we can deduce that columnsToConcat
is a Seq[o.a.s.sql.Column]
or equivalent.
By convention foldLeft
methods require function that maps to the accumulator (initial value). Here's Seq.foldLeft
signature
def foldLeft[B](z: B)(op: (B, A) ⇒ B): B
+
in Scala is a method, specifically a syntactic sugar for .+
call.
It means that in case of:
columnsToConcat.foldLeft(0)((x,y)=>(x+y))
is
columnsToConcat.foldLeft(0)((x: Int, y: Column) => x + y)
and you're asking for +
method of Int
(inferred type of the accumulator - 0
), and since Int
- and there is no +
(org.apache.spark.sql.Column) => Int
method for Int
(the error already lists available methods, and it is hardly unexpected that such method doesn't exist), nor there exist, in the current scope, an implicit conversion from Int
to any type that provides such method.
In the second case you're asking
columnsToConcat.foldLeft(lit(0))((x,y)=>(x+y))
is
columnsToConcat.foldLeft(lit(0))((x: Column, y: Column) => x + y)
and +
refers to Column.+
(as type of lit(0)
is Column
) and such method, which acceptsAny
and returns Column
, exists. Since Column <: Any
type constraints are satisfied