Search code examples
apache-sparkpysparkapache-spark-sqlspark-streaming

How to split line into (key, value) pair with fixed size key


I am new to Apache Spark, I have a file, where every sentences in which first 10 characters is a key and rest is a value, how do I apply spark sort on it to extract the first 10 characters of each sentence as a key and rest as a data, so in the end I get a [key,value] pair Rdd as a output.


Solution

  • map with take and drop should do the trick:

    rdd.map(line => (line.take(10), line.drop(10)))
    

    Sort:

    val sorted = rdd.sortByKey
    

    Prepare output:

    val lines = sorted.map { case (k, v) => s"$k $v" }