In this line, which RDD is being persisted? dropResultsN or dataSetN?
dropResultsN = dataSetN.map(s -> standin.call(s)).persist(StorageLevel.MEMORY_ONLY());
Question arises as a side issue from Apache Spark timing forEach operation on JavaRDD, where I am still looking for a good answer to the core question of how best to time RDD creation.
dropResultsN
is the persisted RDD (which is the RDD produced by mapping dataSetN
onto the method standin.call()
).