I currently have a dataframe like this
+------------+----------+----------+
| mac|time |s |
+------------+----------+----------+
|aaaaaaaaaaaa|11 |a |
|aaaaaaaaaaaa|44 |c |
|bbbbbbbbbbbb|22 |b |
|aaaaaaaaaaaa|33 |a |
+------------+----------+----------+
I want use the .rdd funcition and group by the column "mac" and sort by the column "time",here is an example
res5: Array[(Any, Iterable[(Any, Any)])] = Array((aaaaaaaaaaaa,CompactBuffer((11,a),(33,a),(44,c))), (bbbbbbbbbbbb,CompactBuffer((22,b))))
I already can groupby column "mac" but still can't sort by "time"
df.rdd.map(x=>(x(0),(x(1),x(2)))).groupByKey()
How can I do that?
df.rdd.map(x=>(x(0),(x(1),x(2)))).groupByKey()
.mapValues(_.toSeq.sortBy(_._1.asInstanceOf[Int]))