Search code examples
hadoopamazon-web-serviceselastic-map-reduce

Using other files along with EMR streaming step?


I currently have a hadoop command that I would like to copy using the AWS SDK.

The command I'm currently using

hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input /no_dups -output /sorted -mapper mapper.py -reducer reducer.py -file mapper.py reducer.py other_file1.py other_file2.py

As far as I can see, the StreamingStep class doesn't provide a way to let Hadoop know that other files will be needed, along with the mapper and reducer.

Is this functionality available?


Solution

  • I solved this by passing the -file option to HadoopJarStepConfig with a list of the files I needed.

    See this question