I am new to Apache Spark, I have a file, where every sentences in which first 10 characters is a key and rest is a value, how do I apply spark sort on it to extract the first 10 characters of each sentence as a key and rest as a data, so in the end I get a [key,value] pair Rdd as a output.
map
with take
and drop
should do the trick:
rdd.map(line => (line.take(10), line.drop(10)))
Sort:
val sorted = rdd.sortByKey
Prepare output:
val lines = sorted.map { case (k, v) => s"$k $v" }