Search code examples
amazon-web-servicesaws-sdkamazon-kinesis-firehose

How can I update a FireHose delivery stream's DataFormatConversionConfiguration using the AWS SDK?


Does anyone have a working example of using firehose.update_destination to set an S3 destination's DataFormatConversionConfiguration? I'm following the guidance in Is it possible to specify data format conversion in AWS Cloudformation?, using boto3 (the AWS Python SDK), but I've not been successful. When I include a DFCC in an ExtendedS3DestinationConfiguration argument, it fails with the following error:

Exception during processing: An error occurred (InvalidArgumentException) when calling the UpdateDestination operation: RoleArn must not be null or empty

If I pass the original destination configuration (as returned by describe_delivery_stream) unchanged, the update succeeds. I can also change other config options, e.g. BufferingHints. The only time it fails is when DataFormatConversionConfiguration is non-null.

For example, passing this works:

{
  "RoleARN": "arn:aws:iam::1234567:role/MyExecutionRole",
  "BucketARN": "arn:aws:s3:::my-bucket",
  "Prefix": "databases/tables/requests/",
  "BufferingHints": {
    "SizeInMBs": 64,
    "IntervalInSeconds": 120
  },
  "CompressionFormat": "UNCOMPRESSED",
  "EncryptionConfiguration": {
    "NoEncryptionConfig": "NoEncryption"
  },
  "CloudWatchLoggingOptions": {
    "Enabled": false
  },
  "S3BackupMode": "Disabled"
}

but passing this fails:

{
  "RoleARN": "arn:aws:iam::1234567:role/MyExecutionRole",
  "BucketARN": "arn:aws:s3:::my-bucket",
  "Prefix": "databases/tables/requests/",
  "BufferingHints": {
    "SizeInMBs": 64,
    "IntervalInSeconds": 120
  },
  "CompressionFormat": "UNCOMPRESSED",
  "EncryptionConfiguration": {
    "NoEncryptionConfig": "NoEncryption"
  },
  "CloudWatchLoggingOptions": {
    "Enabled": false
  },
  "S3BackupMode": "Disabled",
  "DataFormatConversionConfiguration": {
    "InputFormatConfiguration": {
      "Deserializer": {
        "OpenXJsonSerDe": {
        }
      }
    },
    "SchemaConfiguration": {
      "TableName": "requests",
      "DatabaseName": "mydb"
    },
    "OutputFormatConfiguration": {
      "Serializer": {
        "OrcSerDe": {
        }
      }
    }
  }
}

The only difference is the DataFormatConversionConfiguration element.

Am I overlooking something obvious? Perhaps the DFCC element is malformed? I've not been able to find any working examples, so I'm going purely from documentation.

I'm also rather surprised by the use of RoleARN and BucketARN in the input element, vs the usual convention of RoleArn and BucketArn, but not sure if it's germane.


Solution

  • As you suspected, your DataFormatConversionConfiguration is malformed.

    Perhaps confusingly I think the RoleArn it's complaining about being missing is DataFormatConversionConfiguration.SchemaConfiguration.RoleARN.

    I'm not going to copy it all here, but I find looking at the service documentation is the best way to find deeper information about the types used by the SDK: https://docs.aws.amazon.com/firehose/latest/APIReference/API_DataFormatConversionConfiguration.html