Search code examples
scalaapache-sparkudf

How can I see the code in Functions.Scala in Spark's github


https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala

I'm trying to write my own UDF for standard deviation for spark 1.5 and was hoping to see the implementation for 1.6. Thanks. If this is not possible, how would I go about writing a udf that calculates the standard deviation of a column given its columnName: (in scala):

def stddev(columnName: String): Column = {}


Solution

  • You can calculate the standard deviation with a UDF within an aggregation like so:

    val df = sc.parallelize(Seq(1,2,3,4)).toDF("myCol")
    df.show
    
    >+-----+
    >|myCol|
    >+-----+
    >|    1|
    >|    2|
    >|    3|
    >|    4|
    >+-----+
    
    def stddev(col: Column): Column = sqrt(avg(col * col) - avg(col) * avg(col))
    df.agg(stddev($"myCol")).first
    
    > [1.118033988749895]