I have created A Rdd's like below
rdd=sc.parallelize([['A','C','B'], ['D','A','B','C'], ['C','B'],['B']])
I want to sort inner list elements. for example first element inside rdd is ['A','C','B']
,but I want to sort like ['A','B','C']
my expected output is:
[['A','B','C'], ['A','B','C','D'], ['B','C'],['B']]
It is easier and usually more efficient (since spark optimizer works on dataframes whereas you need to optimize rdds yourself) to work with dataframes rather than rdds:
from pyspark.sql.functions import *
df=spark.createDataFrame([[['A','C','B']], [['D','A','B','C']], [['C','B']],[['B']]],['l'])
df.show()
+------------+
| l|
+------------+
| [A, C, B]|
|[D, A, B, C]|
| [C, B]|
| [B]|
+------------+
df.withColumn('l',sort_array('l')).show()
+------------+
| l|
+------------+
| [A, B, C]|
|[A, B, C, D]|
| [B, C]|
| [B]|
+------------+
if you still want an rdd you can always
rdd=df.withColumn('l',sort_array('l')).rdd