Search code examples
amazon-web-servicesserverlessamazon-data-pipelineaws-data-pipeline

AWS data pipeline unable to create through serverless yaml template


I was creating data pipeline for dynamo db export to s3. The template given for serverless yaml is not working on "PAY_PER_REQUEST" billing mode

Created one using aws console itr worked fine, exported its definition, tried to create using same definition in serverless but it is giving me following error

ServerlessError: An error occurred: UrlReportDataPipeline - Pipeline Definition failed to validate because of following Errors: [{ObjectId = 'TableBackupActivity', errors = [Object references invalid id: 's3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}']}] and Warnings: [].

Can anyone help me on this. Pipeline created using console is working perfectly with same value of step in table backup activity.

Pipeline template is pasted below

UrlReportDataPipeline:
      Type: AWS::DataPipeline::Pipeline
      Properties: 
        Name: ***pipeline name****
        Activate: true
        ParameterObjects: 
          - Id: "myDDBReadThroughputRatio"
            Attributes: 
              - Key: "description"
                StringValue: "DynamoDB read throughput ratio"
              - Key: "type"
                StringValue: "Double"
              - Key: "default"
                StringValue: "0.9"
          - Id: "myOutputS3Loc"
            Attributes: 
              - Key: "description"
                StringValue: "S3 output bucket"
              - Key: "type"
                StringValue: "AWS::S3::ObjectKey"
              - Key: "default"
                StringValue: 
                  !Join [ "", [ "s3://", Ref: "UrlReportBucket" ] ]
          - Id: "myDDBTableName"
            Attributes: 
              - Key: "description"
                StringValue: "DynamoDB Table Name"
              - Key: "type"
                StringValue: "String"
          - Id: "myDDBRegion"
            Attributes:
              - Key: "description"
                StringValue: "DynamoDB region"
        ParameterValues: 
          - Id: "myDDBTableName"
            StringValue: 
              Ref: "UrlReport"
          - Id: "myDDBRegion"
            StringValue: "eu-west-1"
        PipelineObjects: 
          - Id: "S3BackupLocation"
            Name: "Copy data to this S3 location"
            Fields: 
              - Key: "type"
                StringValue: "S3DataNode"
              - Key: "dataFormat"
                RefValue: "DDBExportFormat"
              - Key: "directoryPath"
                StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
          - Id: "DDBSourceTable"
            Name: "DDBSourceTable"
            Fields: 
              - Key: "tableName"
                StringValue: "#{myDDBTableName}"
              - Key: "type"
                StringValue: "DynamoDBDataNode"
              - Key: "dataFormat"
                RefValue: "DDBExportFormat"
              - Key: "readThroughputPercent"
                StringValue: "#{myDDBReadThroughputRatio}"
          - Id: "DDBExportFormat"
            Name: "DDBExportFormat"
            Fields: 
              - Key: "type"
                StringValue: "DynamoDBExportDataFormat"
          - Id: "TableBackupActivity"
            Name: "TableBackupActivity"
            Fields: 
              - Key: "resizeClusterBeforeRunning"
                StringValue: "true"
              - Key: "type"
                StringValue: "EmrActivity"
              - Key: "input"
                RefValue: "DDBSourceTable"
              - Key: "runsOn"
                RefValue: "EmrClusterForBackup"
              - Key: "output"
                RefValue: "S3BackupLocation"
              - Key: "step"
                RefValue: "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}"
          - Id: "DefaultSchedule"
            Name: "Every 1 day"
            Fields: 
              - Key: "occurrences"
                StringValue: "1"
              - Key: "startDateTime"
                StringValue: "2020-09-17T1:00:00"
              - Key: "type"
                StringValue: "Schedule"
              - Key: "period"
                StringValue: "1 Day"
          - Id: "Default"
            Name: "Default"
            Fields: 
              - Key: "type"
                StringValue: "Default"
              - Key: "scheduleType"
                StringValue: "cron"
              - Key: "failureAndRerunMode"
                StringValue: "CASCADE"
              - Key: "role"
                StringValue: "DatapipelineDefaultRole"
              - Key: "resourceRole"
                StringValue: "DatapipelineDefaultResourceRole"
              - Key: "schedule"
                RefValue: "DefaultSchedule"
          - Id: "EmrClusterForBackup"
            Name: "EmrClusterForBackup"
            Fields: 
              - Key: "terminateAfter"
                StringValue: "2 Hours"
              - Key: "masterInstanceType"
                StringValue: "m3.xlarge"
              - Key: "coreInstanceType"
                StringValue: "m3.xlarge"
              - Key: "coreInstanceCount"
                StringValue: "1"
              - Key: "type"
                StringValue: "EmrCluster"
              - Key: "releaseLabel"
                StringValue: "emr-5.23.0"
              - Key: "region"
                StringValue: "#{myDDBRegion}"

Solution

  • Guys I solved it with AWS support team. As of Today, following is the yaml code which creates a data-pipleine on on-demand pay-per-request dynamodb tables

    You can also convert this to json if you want

        UrlReportBucket:
          Type: AWS::S3::Bucket
          Properties:
            BucketName: ***bucketname***
    
        UrlReportDataPipeline:
          Type: AWS::DataPipeline::Pipeline
          Properties: 
            Name: ***pipelinename***
            Activate: true
            ParameterObjects: 
              - Id: "myDDBReadThroughputRatio"
                Attributes: 
                  - Key: "description"
                    StringValue: "DynamoDB read throughput ratio"
                  - Key: "type"
                    StringValue: "Double"
                  - Key: "default"
                    StringValue: "0.9"
              - Id: "myOutputS3Loc"
                Attributes: 
                  - Key: "description"
                    StringValue: "S3 output bucket"
                  - Key: "type"
                    StringValue: "AWS::S3::ObjectKey"
                  - Key: "default"
                    StringValue: 
                      !Join [ "", [ "s3://", Ref: "UrlReportBucket" ] ]
              - Id: "myDDBTableName"
                Attributes: 
                  - Key: "description"
                    StringValue: "DynamoDB Table Name"
                  - Key: "type"
                    StringValue: "String"
              - Id: "myDDBRegion"
                Attributes:
                  - Key: "description"
                    StringValue: "DynamoDB region"
            ParameterValues: 
              - Id: "myDDBTableName"
                StringValue: 
                  Ref: "UrlReport"
              - Id: "myDDBRegion"
                StringValue: "eu-west-1"
            PipelineObjects: 
              - Id: "S3BackupLocation"
                Name: "Copy data to this S3 location"
                Fields: 
                  - Key: "type"
                    StringValue: "S3DataNode"
                  - Key: "dataFormat"
                    RefValue: "DDBExportFormat"
                  - Key: "directoryPath"
                    StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
              - Id: "DDBSourceTable"
                Name: "DDBSourceTable"
                Fields: 
                  - Key: "tableName"
                    StringValue: "#{myDDBTableName}"
                  - Key: "type"
                    StringValue: "DynamoDBDataNode"
                  - Key: "dataFormat"
                    RefValue: "DDBExportFormat"
                  - Key: "readThroughputPercent"
                    StringValue: "#{myDDBReadThroughputRatio}"
              - Id: "DDBExportFormat"
                Name: "DDBExportFormat"
                Fields: 
                  - Key: "type"
                    StringValue: "DynamoDBExportDataFormat"
              - Id: "TableBackupActivity"
                Name: "TableBackupActivity"
                Fields: 
                  - Key: "resizeClusterBeforeRunning"
                    StringValue: "true"
                  - Key: "type"
                    StringValue: "EmrActivity"
                  - Key: "input"
                    RefValue: "DDBSourceTable"
                  - Key: "runsOn"
                    RefValue: "EmrClusterForBackup"
                  - Key: "output"
                    RefValue: "S3BackupLocation"
                  - Key: "step"
                    StringValue: "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{myDDBTableName},#{myDDBReadThroughputRatio}"
              - Id: "DefaultSchedule"
                Name: "Every 1 day"
                Fields: 
                  - Key: "occurrences"
                    StringValue: "1"
                  - Key: "startDateTime"
                    StringValue: "2020-09-23T1:00:00"
                  - Key: "type"
                    StringValue: "Schedule"
                  - Key: "period"
                    StringValue: "1 Day"
              - Id: "Default"
                Name: "Default"
                Fields: 
                  - Key: "type"
                    StringValue: "Default"
                  - Key: "scheduleType"
                    StringValue: "cron"
                  - Key: "failureAndRerunMode"
                    StringValue: "CASCADE"
                  - Key: "role"
                    StringValue: "DatapipelineDefaultRole"
                  - Key: "resourceRole"
                    StringValue: "DatapipelineDefaultResourceRole"
                  - Key: "schedule"
                    RefValue: "DefaultSchedule"
              - Id: "EmrClusterForBackup"
                Name: "EmrClusterForBackup"
                Fields: 
                  - Key: "terminateAfter"
                    StringValue: "2 Hours"
                  - Key: "masterInstanceType"
                    StringValue: "m3.xlarge"
                  - Key: "coreInstanceType"
                    StringValue: "m3.xlarge"
                  - Key: "coreInstanceCount"
                    StringValue: "1"
                  - Key: "type"
                    StringValue: "EmrCluster"
                  - Key: "releaseLabel"
                    StringValue: "emr-5.23.0"
                  - Key: "region"
                    StringValue: "#{myDDBRegion}"