Search code examples
scalaapache-sparkapache-spark-sqlrdd

How can I group and sort columns in spark.rdd


I currently have a dataframe like this

+------------+----------+----------+
|         mac|time      |s         |
+------------+----------+----------+
|aaaaaaaaaaaa|11        |a         |
|aaaaaaaaaaaa|44        |c         |
|bbbbbbbbbbbb|22        |b         |
|aaaaaaaaaaaa|33        |a         |
+------------+----------+----------+

I want use the .rdd funcition and group by the column "mac" and sort by the column "time",here is an example

res5: Array[(Any, Iterable[(Any, Any)])] = Array((aaaaaaaaaaaa,CompactBuffer((11,a),(33,a),(44,c))), (bbbbbbbbbbbb,CompactBuffer((22,b))))

I already can groupby column "mac" but still can't sort by "time"

df.rdd.map(x=>(x(0),(x(1),x(2)))).groupByKey()

How can I do that?


Solution

  •   df.rdd.map(x=>(x(0),(x(1),x(2)))).groupByKey()
         .mapValues(_.toSeq.sortBy(_._1.asInstanceOf[Int]))