Search code examples
loggingapache-sparkdataframerddlogarithm

How to apply base2 logarithm to an RDD of Ints in Spark?


What is the proper way of doing this to apply log2 to my RDD of numbers? Is there a function to help with this?


Solution

  • RDD:

    import org.apache.commons.math.util.MathUtils
    
    val rdd: RDD[Double] = ???
    rdd.map(x => MathUtils.log(2.0, x))
    

    DataFrame:

    import org.apache.spark.sql.functions.log2
    
    rdd.toDF("value").select(log2("value"))