Search code examples
aws-cloudformationaws-cdkamazon-data-pipeline

Datapipeline validation error while adding EMR configuration in datapipeline object list in cloud formation template


If I add a "configuration" object in the datapipeline object list I get the error:

Pipeline Definition failed to validate because of following Errors:
[{ObjectId = 'SampleEMRCluster', errors = [Fields with references 
to scheduable objects or preconditions can not be added to existing objects.
Found 'configuration']}]

Before adding this the synth & deploy works ok and the datapipeline also works ok. Here is what the relevant portion of the synthesized cloud formation template looks like:

        "PipelineObjects": [
      {
        "Fields": [
          {
            "Key": "type",
            "StringValue": "Default"
          },
          {
            "Key": "maxActiveInstances",
            "StringValue": "1"
          },
          {
            "Key": "scheduleType",
            "StringValue": "cron"
          },
          {
            "Key": "pipelineLogUri",
            "StringValue": {
              "Fn::Join": [
                "",
                [
                  "s3://",
                  {
                    "Ref": "sampleprodnaA928775C"
                  },
                  "/data-pipeline-logs/"
                ]
              ]
            }
          },
          {
            "Key": "role",
            "StringValue": {
              "Ref": "DPRoleprodna120283D1"
            }
          },
          {
            "Key": "resourceRole",
            "StringValue": {
              "Ref": "DPResourceRoleprodna6634AAB4"
            }
          },
          {
            "Key": "failureAndRerunMode",
            "StringValue": "CASCADE"
          },
          {
            "Key": "schedule",
            "RefValue": "DefaultSchedule"
          }
        ],
        "Id": "Default",
        "Name": "Default"
      },
      {
        "Fields": [
          {
            "Key": "type",
            "StringValue": "Schedule"
          },
          {
            "Key": "startAt",
            "StringValue": "FIRST_ACTIVATION_DATE_TIME"
          },
          {
            "Key": "period",
            "StringValue": "1 hour"
          }
        ],
        "Id": "DefaultSchedule",
        "Name": "Every 1 hour"
      },
      {
        "Fields": [
          {
            "Key": "type",
            "StringValue": "EmrCluster"
          },
          {
            "Key": "coreInstanceType",
            "StringValue": "i3.xlarge"
          },
          {
            "Key": "coreInstanceCount",
            "StringValue": "1"
          },
          {
            "Key": "masterInstanceType",
            "StringValue": "i3.xlarge"
          },
          {
            "Key": "terminateAfter",
            "StringValue": "1 hour"
          },
          {
            "Key": "resourceRole",
            "StringValue": "EMR_EC2_DefaultRole"
          },
          {
            "Key": "role",
            "StringValue": "EMR_DefaultRole"
          },
          {
            "Key": "subnetId",
            "StringValue": {
              "Ref": "VpcPrivateSubnet1Subnet536B997F"
            }
          },
          {
            "Key": "emrManagedMasterSecurityGroupId",
            "StringValue": {
              "Ref": "EMRControllerC4OFF237"
            }
          },
          {
            "Key": "emrManagedSlaveSecurityGroupId",
            "StringValue": {
              "Ref": "EMRWorkerE1C2639A"
            }
          },
          {
            "Key": "serviceAccessSecurityGroupId",
            "StringValue": {
              "Ref": "EMRServiceAccessB1B4D1B5"
            }
          },
          {
            "Key": "releaseLabel",
            "StringValue": "emr-5.30.0"
          },
          {
            "Key": "configuration",
            "RefValue": "SparkConfiguration"
          }
        ],
        "Id": "SampleEMRCluster",
        "Name": "SampleEMRCluster"
      },
      {
        "Fields": [
          {
            "Key": "type",
            "StringValue": "EmrConfiguration"
          },
          {
            "Key": "classification",
            "StringValue": "spark"
          },
          {
            "Key": "property",
            "RefValue": "sparkProperty01"
          }
        ],
        "Id": "SparkConfiguration",
        "Name": "SparkConfiguration"
      },
      {
        "Fields": [
          {
            "Key": "type",
            "StringValue": "Property"
          },
          {
            "Key": "key",
            "StringValue": "maximizeResourceAllocation"
          },
          {
            "Key": "value",
            "StringValue": "true"
          }
        ],
        "Id": "sparkProperty01",
        "Name": "sparkHiveSiteProperty01"
      },
      ...//other pipeline objects
]

Can someone help me understand what is wrong in the template?


Solution

  • Some fields cannot be edited once the datapipeline is created in AWS (adding configuration, changing emr step dependencies seem to be part of those). Manually deleting the stack in the UI and retrying worked. Some documentation on which fields cannot be edited: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-manage-pipeline-modify-console.html