According to the SparkJobServer documentation:
validate allows for an initial validation of the context and any provided configuration. If the context and configuration are OK to run the job, returning spark.jobserver.SparkJobValid will let the job execute, otherwise returning spark.jobserver.SparkJobInvalid(reason) prevents the job from running and provides means to convey the reason of failure. In this case, the call immediately returns an HTTP/1.1 400 Bad Request status code. validate helps you preventing running jobs that will eventually fail due to missing or wrong configuration and save both time and resources.
Can I therefore assume that validate()
would always be called before runJob()
?
If I load and verify the job configuration in validate()
, can my runJob()
assume it was loaded correctly and is available where validate()
left it?
Yes, your assumption is correct. See https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server/src/spark.jobserver/JobManagerActor.scala#L268