Search code examples
phpamazon-web-serviceselastic-map-reducehadoop-streamingamazon-emr

How can we pass arguments for Hadoop Streaming from AWS SDK for PHP?


I'm trying to add some job via AWS SDK for PHP. I'm able to successfully start a cluster and start new job flow via API but I'm getting an error while trying to create Hadoop Streaming step.

Here is my code:

// add some jobflow steps
$response = $emr->add_job_flow_steps($JobFlowId, array(
    new CFStepConfig(array(
        'Name' => 'MapReduce Step 1. Test',
        'ActionOnFailure' => 'TERMINATE_JOB_FLOW',
        'HadoopJarStep' => array(
    'Jar' => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
            // ERROR IS HERE!!!! How can we pas the parameters?
    'Args' => array(
                '-input s3://logs-input/appserver1 -output s3://logs-input/job123/ -mapper s3://myscripts/mapper-apache.php -reducer s3://myscripts/reducer.php',
              ),
        )
   )),
));

I'm getting error like: Invalid streaming parameter '-input s3://.... -output s3://..... -mapper s3://....../mapper.php -reducer s3://...../reducer.php"

So it is not clear how can I pass the arguments to Hadoop Streaming JAR ?

Official AWS SDK for PHP documentation doesn't provides any examples or documentation.

Possibly related unanswered thread:

Pass parameters to hive script using aws php sdk


Solution

  • This worked for me:

    'Args' => array( '-input','s3://mybucket/in/','-output','s3://mybucket/oo/',
                    '-mapper','s3://mybucket/c/mapperT1.php',
                        '-reducer','s3://mybucket/c/reducerT1.php')