Search code examples
amazon-web-servicesamazon-emramazon-data-pipeline

Security-Configuration Field For AWS Data Pipeline EmrCluster


I created an AWS EMR Cluster through the regular EMR Cluster wizard on the AWS Management Console and I was able to select a security-configuration e.g., when you export the CLI command it's --security-configuration 'mySecurityConfigurationValue'.

I now need to create a similar EMR through the AWS Data Pipeline but I don't see any options where I can specify this security-configuration field.

The only similar fields I see are EmrManagedSlaveSecurityGroup, EmrManagedMasterSecurityGroup, AdditionalSlaveSecurityGroups, AdditionalMasterSecurityGroups, and SubnetId. I already have all of those filled out in my Pipeline configuration but I just need to also specify the security-configuration. Any thoughts?


Solution

  • Unfortunately, DataPipeline does not support the Security Configurations feature (as well as other features that were introduced in the EMR 5.x versions like using a custom AMI).

    One solution for this is to:

    1. Replace the EmrCluster in your pipeline with an EC2 resource
    2. Use a ShellCommandActivity on the EC2 resource to run the aws emr create-cluster CLI command
    3. Use a bootstrap step to install TaskRunner on the cluster
    4. Replace all the runsOn properties in your pipeline with workerGroup so the tasks run on the EMR cluster you created in step 2
    5. Add a final ShellCommandActivity at the end of the pipeline to terminate the cluster using CLI

    Now since you are spinning up your cluster using the CLI you have access to all kinds of features like security configurations, custom AMI, instance fleets, etc. and you can still orchestrate the tasks using DataPipeline.