Is there a way to capture the console output of a spark job in Oozie? I want to use the specific printed value in the next action node after the spark job.
I was thinking that I could have maybe used the ${wf:actionData("action-id")["Variable"]}
but it seems that oozie does not have the capability to capture output from a spark action node unlike in the Shell action you could just use echo "var=12345"
and then invoke the wf:actionData in oozie to be used as an Oozie Variable across the workflow.
I want to achieve that because I want to print the possible number of records processed and store that as an oozie variable and use that to the next action nodes in the workflow without doing any functionalities that requires you to store that data outside of the workflow like saving them in a table or storing them as a system variable via the implementing them inside the Spark Scala Program.
Any help would be thoroughly appreciated since I'm still a novice spark programmer. Thank you very much.
As Spark action does not support capture-output, you'll have to write the data into a file to HDFS. This post explains how to do that from Spark.