I currently have a hadoop command that I would like to copy using the AWS SDK.
The command I'm currently using
hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input /no_dups -output /sorted -mapper mapper.py -reducer reducer.py -file mapper.py reducer.py other_file1.py other_file2.py
As far as I can see, the StreamingStep
class doesn't provide a way to let Hadoop know that other files will be needed, along with the mapper and reducer.
Is this functionality available?
I solved this by passing the -file
option to HadoopJarStepConfig
with a list of the files I needed.
See this question