Search code examples
apache-sparkcachingpersist

Using dataset.persisit() and dataset.unpersist() in Java


I have a Spark Dataset dataset. I need to do .collectAsList() for Each column of Dataset. How can I use .persist() and .unpersist() to avoid huge time for operations?

Since I am new , I am not sure how do I make use of the persist functions. Do I need to assign this to a dataset as dataset=dataset.persist(); or just a dataset.persist() would do the trick?


Solution

  • if you just want to cache the dataset, then use dataset.persist(). Similarly, dataset.unpersist() to remove all blocks for it from memory.