Search code examples
scalafileapache-sparkdistributed-computingapache-spark-mllib

Create a mapped RDD and save it in text


I runned a K-means example and I have an RDD with my data named parsedData and my model named clusters. I want to create a mapped Rdd with datapoint and prediction cluster from the model. So I tried

val predictions = parsedData.map( point => 
{
val pointPred = clusters.predict(point) 
Array(point,pointPred)
})

when I try

 predictions.first()

I take

Array[Any] = Array([0.8898668778942382,0.89533945283595], 0)

which is the result I want. So then I tried

predictions.saveAsTextFile ("/../ClusterResults");

to save The Arrays from each datapoint in a local file but the file created was

[Ljava.lang.Object;@3b43c55c

[Ljava.lang.Object;@5e523969

[Ljava.lang.Object;@68374cdf ....

had the objects and not the data. I also tried to print from the RDD like

predictions.take(10).map(println)

and took the objects as a result again. How can I take the data and not the objects and save them to a local file?


Solution

  • The problem lies in the way you map your data. Try using a Tuple, instead of an Array.

    Example:

    val predictions = parsedData.map( point => {
      (point, clusters.predict(point))
    })