Search code examples
hadoopgoogle-cloud-dataproc

Hadoop job error in dataproc gui version on Google cloud


I am trying to create a job for wordcount using org.apache.hadoop.examples.WordCount but it shows errors:

I am attaching images of my job configuration and also where are the files located in my bucket(I am using buckets and not hdfs)

Job configuration used:

These are my job configuartion

Files are stored in bucket: the 2nd file in screenshot is hadoop-mapreduce-examples.ar

URI for hadoop-mapreduce-examples.jar:enter image description here

The error i got when I used above configurations: Job failed with message [Exception in thread "main" java.lang.reflect.InvocationTargetException]. Additional details can be found at: https://console.cloud.google.com/dataproc/jobs/job-58ef7440?project=hadoop-304309&region=us-central1 gcloud dataproc jobs wait 'job-58ef7440' --region 'us-central1' --project 'hadoop-304309' https://console.cloud.google.com/storage/browser/wordbucket01/google-cloud-dataproc-metainfo/7e251bd2-bd3f-4915-aea3-fba5789e6ee3/jobs/job-58ef7440/ gs://wordbucket01/google-cloud-dataproc-metainfo/7e251bd2-bd3f-4915-aea3-fba5789e6ee3/jobs/job-58ef7440/driveroutput

The job output: The job output,"5 more " written at last is not clickable and I can't even expand that so its not possible what's there in those 5 more lines The driver outputfile: DriverOutputFile


Solution

  • The problem is that by default Hadoop won't write to an existing path, unless an "overwrite" mode has been specifically asked for. You have two options:

    • Delete the output path before running the example
    • Use another output path for each run

    In general, the output of the job can be found in the Google Cloud console - look for the "Jobs" page under Dataproc, or for the Jobs tab under the cluster page. Also, as presented here, the output is saved to GCS and you can always retrieve it from there.