I have a Spark Dataset dataset
.
I need to do .collectAsList()
for Each column of Dataset.
How can I use .persist()
and .unpersist()
to avoid huge time for operations?
Since I am new , I am not sure how do I make use of the persist functions.
Do I need to assign this to a dataset as dataset=dataset.persist();
or just a dataset.persist()
would do the trick?
if you just want to cache the dataset, then use dataset.persist(). Similarly, dataset.unpersist() to remove all blocks for it from memory.