Search code examples
scalagojaremr

What is the correct way to use the AddJobFlowStep in the AWS EMR sdk?


I've used the go AWS sdk to create a cluster and added a job flow step to it. However the execution of the step always fails when I do it programatically. An interesting point to notice is that when I attach the jar from the UI, it successfully executes.

So when the jar is attached from the UI, this is the outcome of the step execution(it runs successfully and moves to the COMPLETED state): (Copying the full text)

JAR location : command-runner.jar Main class : None Arguments : spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar Action on failure: Continue

However, this is the output of the step when I try programatically:

Status :FAILED Reason : Main Class not found. Log File : s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-337M25MLV3BHT/stderr.gz Details : Caused by: java.lang.ClassNotFoundException: scala.reflect.api.TypeCreator JAR location : s3://mdv-testing/Util-assembly-1.0.jar Main class : None > Arguments : spark-submit "--class Hello" Action on failure: Cancel and wait

I tried various combinations for the arguments and realised that the command-runner.jar was never present. I accordingly made changes to the code and send the command-runner.jar as the argument now. This now reflects the same details as the step that executes successfully. This is the revised output:

Status :FAILED Reason : Unknown Error. Log File : s3://mdv-testing/awsLogs/j-3RW9K14BS6GLO/steps/s-3NI5ZO15VTWQK/ JAR location : command-runner.jar Main class : None Arguments : "spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar Action on failure: Cancel and wait

Go Code

package main
import (
"fmt"

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/emr"
)

func main() {
sess := session.New(&aws.Config{Region: aws.String("us-east-1")})
svc := emr.New(sess)

params := &emr.AddJobFlowStepsInput{
JobFlowId: aws.String("j-3RW9K14BS6aaa"),
Steps: []*emr.StepConfig{
{
    ActionOnFailure: aws.String("CANCEL_AND_WAIT"), //TERMINATE_CLUSTER"),
    HadoopJarStep: &emr.HadoopJarStepConfig{
    Args: []*string{
                     aws.String("spark-submit --deploy-mode cluster --class Hello s3://mdv-testing/Util-assembly-1.0.jar"),
                   },
                     Jar: aws.String("command-runner.jar"), },
                     Name: aws.String("ReportJarExecution"),
    },
},
}

resp, err := svc.AddJobFlowSteps(params)

if err != nil {
// Print the error, cast err to awserr. sError to get the Code and
// Message from an error.
fmt.Println(err.Error())
return
}

// Pretty-print the response data.
fmt.Println(resp)
}

Can someone please help me !!! I think I'm pretty close to the solution but it is evading me big time :(


Solution

  • I managed to solve this issue. For anyone who is struggling with something similar, the answer is that we need to send the arguments separately in an array.