One can have an array of partitions of a Spark DataFrame
as follows:
> df.rdd.partitions
Is there a way to get more information about partitions? In particular, I would like to see the partition key and the partition boundaries (first and last element within a partition).
This is just for better understanding of how the data is organized.
This is what I tried:
> df.partitions.rdd.head
But this object only has attributes and methods equals
hashCode
and index
.
In case the data is not too large, one can write them to disk as follows:
df.write.option("header", "true").csv("/tmp/foobar")
The given directory must not exist.