Search code examples

How can we pass arguments for Hadoop Streaming from AWS SDK for PHP?

I'm trying to add some job via AWS SDK for PHP. I'm able to successfully start a cluster and start new job flow via API but I'm getting an error while trying to create Hadoop Streaming step.

Here is my code:

// add some jobflow steps
$response = $emr->add_job_flow_steps($JobFlowId, array(
    new CFStepConfig(array(
        'Name' => 'MapReduce Step 1. Test',
        'ActionOnFailure' => 'TERMINATE_JOB_FLOW',
        'HadoopJarStep' => array(
    'Jar' => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
            // ERROR IS HERE!!!! How can we pas the parameters?
    'Args' => array(
                '-input s3://logs-input/appserver1 -output s3://logs-input/job123/ -mapper s3://myscripts/mapper-apache.php -reducer s3://myscripts/reducer.php',

I'm getting error like: Invalid streaming parameter '-input s3://.... -output s3://..... -mapper s3://....../mapper.php -reducer s3://...../reducer.php"

So it is not clear how can I pass the arguments to Hadoop Streaming JAR ?

Official AWS SDK for PHP documentation doesn't provides any examples or documentation.

Possibly related unanswered thread:

Pass parameters to hive script using aws php sdk


  • This worked for me:

    'Args' => array( '-input','s3://mybucket/in/','-output','s3://mybucket/oo/',