Search code examples
cachingapache-sparkrddpersist

Spark: How do I save an RDD before unpersist it


I want to unpersist myRDD after the data is saved. I have the following code:

  val isDone = myRDD.saveAsTextFile("myOutput")
  if (isDone != null) {
     myRDD.unpersist()
  }

but the line:

isDone != null

keeps saying: comparing values of types Unit and Null using `!=' will always yield true

What should be the correct way to solve this problem? Thank you!


Solution

  • This should work fine:

    myRDD.saveAsTextFile("myOutput")
    myRDD.unpersist()
    

    The data will be saved before the RDD is unpersisted.

    Note that the saveAsTextFile method returns Unit. This is a type that is returned by any method (procedure) which does not return anything useful, and there is only one instance, i.e. (). So nothing useful is achieved by testing on the value returned by saveAsTextFile. Also, Unit being a subtype of AnyVal can never be equal to null, that's why you're getting the particular error that you're seeing. The same thing happens for Ints:

    def foo(x: Int) = x != null