If there was a Spark RDD like such:
id | data
----------
1 | "a"
1 | "b"
2 | "c"
3 | "d"
How could I output this to separate json textfiles, grouped based on the id? Such that part-0000-1.json would contain rows "a" and "b", part-0000-2.json contains "c", etc.
df.write.partitionBy("col").json(<path_to_file>)
is what you need.