Search code examples
powershellamazon-web-servicesemramazon-emr

AWS EMR job using PowerShell Cmdlet


I have a pig script that accept some arguments. I need to use AWS PowerShell Cmdlet only. I am able to create cluster with pig installed using below command:

$app = New-Object  Amazon.ElasticMapReduce.Model.Application
$app.Name="Pig"
$jobid = Start-EMRJobFlow -Name "Pig Job"  -Application $app -Instances_MasterInstanceType "m3.xlarge" -Instances_KeepJobFlowAliveWhenNoSteps $true -Instances_InstanceCount 1 -LogUri "s3://mybucket/logs" -VisibleToAllUsers $true -ReleaseLabel "emr-5.7.0" -SecurityConfiguration "my-sec-grp" -JobFlowRole "EMR_EC2_DefaultRole" -ServiceRole "EMR_DefaultRole"

But I am not able to add step for pig job. I followed some articles but those are very old or those are using some custom jar to submit the job. I just need to submit a pig script which is accepting some parameters. Any help will be highly appreciated Note: i need powershell specific commands. I am able to do this using AWS cli.


Solution

  • I got the way to submit pig scripts from powershell. I was following this link. But the problem was that its regarding Hive scripts. So the step where its creating step as

    $runhivescriptargs = @("s3://us-east-1.elasticmapreduce/libs/hive/hive-script", `
             "--base-path", "s3://us-east-1.elasticmapreduce/libs/hive", `
             "--hive-versions","latest", `
             "--run-hive-script", `
             "--args", `
             "-f", "s3://elasticmapreduce/samples/hive-ads/libs/join-clicks-to-impressions.q", `
             "-d", "SAMPLE=s3://elasticmapreduce/samples/hive-ads",`
             "-d", "DAY=2009-04-13", `
             "-d", "HOUR=08", `
             "-d", "NEXT_DAY=2009-04-13", `
             "-d", "NEXT_HOUR=09",`
             "-d", "INPUT=s3://elasticmapreduce/samples/hive-ads/tables", `
             "-d", "OUTPUT=s3://my-output-bucket/joinclick1", `
             "-d", "LIB=s3://elasticmapreduce/samples/hive-ads/libs")
    

    So i followed the same steps but somehow in case of pig scripts arguments need to be passed using -p option not using -d option So the my step creation is like:

    $runpigscriptargs = @("s3://us-east-1.elasticmapreduce/libs/pig/pig-script", `
             "--base-path", "s3://us-east-1.elasticmapreduce/libs/pig", `
             "--run-pig-script", `
             "--args", `
             "-f", $scriptfile, `
             "-p", "Id=$Id",`
             "-p", "jarPath=$jarPath",`
             "-p", "inputPath=$newInputPath", `
             "-p", "outputPath=$outputPath")
    

    I am not specifying pig version as i have already created a EMR cluster having latest version of pig installed Thanks