Search code examples
javahadoopgoogle-cloud-platformmapreducegoogle-cloud-dataproc

What is the best way to migrate java hadoop jobs to dataproc


I'm following the example from google.

In my old code I have the job submit like following:

Configuration conf = HBaseConfiguration.create();
Job job = Job.getInstance(conf, "word count");
job.setJobName("");
job.setJarByClass(getClass()); // class that contains mapper and reducer
job.setMapSpeculativeExecution(false);
job.setCombinerClass(<JobCombiner>.class);
job.setReducerClass(<JobReducer>.class);
job.setReduceSpeculativeExecution(false);
// some additional configs
job.submit();

How can I migrate this job to dataproc? I try to follow this answer - How do you use the Google DataProc Java Client to submit spark jobs using jar files and classes in associated GS bucket? Instead of SparkJob I use HaoopJob. But the main issue here is we need to submit jar and main class file. Is there any way that we can simply migrate the existing Job class and run the job in dataproc?


Solution

  • Given that code, you should be able to run the mapreduce jar directly.

    Hadoop jobs are configured for their cluster from the xml config files that exist on each node, not typically within the code itself