Search code examples
scalaapache-sparkvectorrddnormal-distribution

Operation on normalVectorRDD


I want to create an RDD[Vector] with my own mean and my own sigma, i have done this :

val mean = Random.nextInt(100)
val sigma = 2
val data: RDD[Vector] = RandomRDDs.normalVectorRDD(sc, numRows = 180, numCols = 20).map(v => mean + sigma * v)

but I have the following error :

overloaded method value * with alternatives:
  (x: Double)Double <and>
  (x: Float)Float <and>
  (x: Long)Long <and>
  (x: Int)Int <and>
  (x: Char)Int <and>
  (x: Short)Int <and>
  (x: Byte)Int
 cannot be applied to (org.apache.spark.mllib.linalg.Vector)
      val data: RDD[Vector] = RandomRDDs.normalVectorRDD(sc, numRows = 180, numCols = 20).map(v => mean + sigma * v)

I don't understand this error because oin the scala documentation, they do RandomRDDs.normal(sc, n, p, seed) .map(lambda v: mean + sigma * v) also

Thanks


Solution

  • The Spark doc. makes reference to the .normal() method:

    val data = 
     RandomRDDs.normalRDD(spark.sparkContext, 50, 1).map(v => mean + sigma * v)
    

    This actually runs ok.

    If you need to apply the transformation to a Vector:

    val data0 = 
      RandomRDDs.normalVectorRDD(spark.sparkContext, numRows = 180, numCols = 20).map(v => v.toArray.map(v => mean + sigma * v))