Search code examples
javascalaapache-sparkrddtiming

Apache Spark what am I persisting here?


In this line, which RDD is being persisted? dropResultsN or dataSetN?

dropResultsN = dataSetN.map(s -> standin.call(s)).persist(StorageLevel.MEMORY_ONLY());

Question arises as a side issue from Apache Spark timing forEach operation on JavaRDD, where I am still looking for a good answer to the core question of how best to time RDD creation.


Solution

  • dropResultsN is the persisted RDD (which is the RDD produced by mapping dataSetN onto the method standin.call()).