Search code examples
apache-flink

How can I see if a job failed and why?


How can I use ClusterClient to check if a job failed and why?

ClusterClient#getJobStatus may seem like a good first candidate but it only says if the job failed without any information regarding the exceptions.

The submission of the job is being done with a detached client therefore waiting for its ClusterClient#run to return a JobExecutionResult is not an option.

I've also tried RestClusterClient#retrieveJob also does not work, failing with:

org.apache.flink.runtime.client.JobRetrievalException: Couldn't retrieve leading JobManager. at org.apache.flink.runtime.client.JobListeningContext.getJobManager(JobListeningContext.java:157) at org.apache.flink.runtime.client.JobListeningContext.getClassLoader(JobListeningContext.java:141) at org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:262) at org.apache.flink.client.program.ClusterClient.retrieveJob(ClusterClient.java:586) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway. at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:82) at org.apache.flink.runtime.client.JobListeningContext.getJobManager(JobListeningContext.java:152) ... 10 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at scala.concurrent.Await.result(package.scala) at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:80) ... 11 more


Solution

  • Use NewClusterClient#requestJobResult which can be done using a RestClusterClient.