Hello~ I'm interested in SPARK. I use this below code in spark-shell.
val data = sc.parallelize(Array(Array(1,2,3), Array(2,3,4), Array(1,2,1))
res6: org.apache.spark.rdd.RDD[Array[Int]] = ParallelCollectionRDD[0] at parallelize at <console>:26
data.map(x => (x(d), 1)).reduceByKey((x,y) => x + y).sortBy(_._1)
res9: Array[(Int, Int)] = Array((1,2), (2,1))
It work. But, if I use this command using sbt assembly, It's not worked.
The error message is
[error] value sortBy is not a member of org.apache.spark.rdd.RDD[(Int, Int)]
[error] data.map(x => (x(d), 1)).reduceByKey((x,y) => x + y).sortBy(_._1) <= here is the problem.
my build.sbt code is
import AssemblyKeys._
assemblySettings
name := "buc"
version := "0.1"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0" % "provided"
Is there something problem?
The first problem is that you are using spark 1.0.0
, and if you read the documentation you won't find any sortBy
method in the RDD class. So,you should update from 1.0.x
to 2.0.x
.
On other hand, the spark-mllib
dependency is used to get the Spark MLlib library and that's not what you need. You need to get the dependency for spark-core
:
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.0" % "provided"