Search code examples
scalaapache-sparkk-means

How to apply kmeans for parquet file?


parquet file

I want to apply k-means for my parquet file.but error appear .

edited

java.lang.ArrayIndexOutOfBoundsException: 2

code

val Data = sqlContext.read.parquet("/usr/local/spark/dataset/norm")
val parsedData = Data.rdd.map(s => Vectors.dense(s.getDouble(1),s.getDouble(2))).cache()

import org.apache.spark.mllib.clustering.KMeans 
val numClusters = 30
val numIteration = 1
 val userClusterModel = KMeans.train(parsedData, numClusters, numIteration)
val userfeature1 = parsedData.first 
val userCost = userClusterModel.computeCost(parsedData)
println("WSSSE for users: " + userCost)

How to solve this error?


Solution

  •     val parsedData = Data.rdd.map(s => Vectors.dense(s.getInt(0),s.getDouble(1))).cache()