I'm trying to add some job via AWS SDK for PHP. I'm able to successfully start a cluster and start new job flow via API but I'm getting an error while trying to create Hadoop Streaming step.
Here is my code:
// add some jobflow steps
$response = $emr->add_job_flow_steps($JobFlowId, array(
new CFStepConfig(array(
'Name' => 'MapReduce Step 1. Test',
'ActionOnFailure' => 'TERMINATE_JOB_FLOW',
'HadoopJarStep' => array(
'Jar' => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
// ERROR IS HERE!!!! How can we pas the parameters?
'Args' => array(
'-input s3://logs-input/appserver1 -output s3://logs-input/job123/ -mapper s3://myscripts/mapper-apache.php -reducer s3://myscripts/reducer.php',
),
)
)),
));
I'm getting error like: Invalid streaming parameter '-input s3://.... -output s3://..... -mapper s3://....../mapper.php -reducer s3://...../reducer.php"
So it is not clear how can I pass the arguments to Hadoop Streaming JAR ?
Official AWS SDK for PHP documentation doesn't provides any examples or documentation.
Possibly related unanswered thread:
This worked for me:
'Args' => array( '-input','s3://mybucket/in/','-output','s3://mybucket/oo/',
'-mapper','s3://mybucket/c/mapperT1.php',
'-reducer','s3://mybucket/c/reducerT1.php')