Search code examples
pythonapache-sparkibm-cloud

how to increase the logging output for spark-submit job on bluemix?


I've submitted a python job to bluemix spark as a service and it has failed. Unfortunately the logging is insufficient and doesn't give me a clue why it has failed.

How can I increase the log level output?

Output from spark as a service:

==== Failed Status output =====================================================

Getting status
HTTP/1.1 200 OK
Server: nginx/1.8.0
Date: Thu, 12 May 2016 19:09:30 GMT
Content-Type: application/json;charset=utf-8
Content-Length: 850
Connection: keep-alive

{
  "action" : "SubmissionStatusResponse",
  "driverState" : "ERROR",
  "message" : "Exception from the cluster:
org.apache.spark.SparkUserAppException: User application exited with 255
    org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:88)
    org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    java.lang.reflect.Method.invoke(Method.java:507)
    org.apache.spark.deploy.ego.EGOClusterDriverWrapper$$anon$3.run(EGOClusterDriverWrapper.scala:430)",
  "serverSparkVersion" : "1.6.0",
  "submissionId" : "xxxxxx",
  "success" : true
}
===============================================================================

I have run the same job successfully against a BigInsights cluster. I also get much more verbose output when running on the biginsights cluster.


Solution

  • There are stderr-%timestamp% and stdout-%timestamp%files downloaded from cluster to your local directory where you ran spark-submit.sh. Normally you'll find the job problems in those two files.

    Reference: http://spark.apache.org/docs/latest/spark-standalone.html#monitoring-and-logging