Search code examples
pythonhadoop-streaminggoogle-cloud-dataproc

Where are application errors logs?


In anticipation of having to debug our Python code by looking for the the error messages in the log files, I have created a Hadoop Streaming job that throws an exception but I can't locate the error message (or the stack trace).

Similar questions hadoop streaming: where are application logs? and hadoop streaming: how to see application logs? use Python's logging module which is not desirable here because Python already logs the error so we shouldn't have to.

Here is the mapper code; we use Hadoop's built-in reducer aggregate.

#!/usr/bin/python
import sys, re
import random

def main(argv):
  line = sys.stdin.readline()
  pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*")
  try:
    while line:
      for word in pattern.findall(line):
        print "LongValueSum:" + word.lower() + "\t" + "1"
        x = 1 / random.randint(0,99)
      line = sys.stdin.readline()
  except "end of file":
    return None
if __name__ == "__main__":
  main(sys.argv)

The x = 1 / random.randint(0,99) line is supposed to create a ZeroDivisionError and indeed the job fails but grepping the log files doesn't show the error. Is there a special flag we need to be setting someplace?

We have gone through the Google Dataproc documentation as well as the Hadoop Streaming documentation.


Solution

  • When you run a Cloud Dataproc job, job driver output is streamed to the GCP Console, displayed in the command terminal window (for jobs submitted from the command line), and stored in Cloud Storage, see accessing job driver output. You can also find the log in StackDriver with name dataproc.job.driver.

    You can also enable YARN container logs when creating the cluster and view them in StackDriver, see instructions.

    In addition to that, yarn-userlogs in StackDriver might also be useful.