Search code examples
amazon-web-servicesamazon-s3aws-lambdareplicationtagging

Amazon S3 tags for automatic replication with specific prefix?


I have two Amazon S3 buckets set up for cross-region-replication. Whenever there is an upload in the source bucket with a specific prefix, I need the respective data to be replicated to my "processing bucket" in a different region. However I need to know at least some information about the original source bucket after the replication process, because I want to set up multiple buckets including replication with the same destination bucket, while the processing is going to be done via lambda events.

I thought about getting this to work with tagging but I can't find ways to automatically tag uploaded data containing a specific prefix before (or after?) they are replicated.

The only thing closing in on this topic I could find was https://docs.aws.amazon.com/AmazonS3/latest/dev/batch-ops-put-object-tagging.html, but I can't make much of that, as I'm not sure, if this is what I'm searching for, especially regarding the automatic replication functionality.

To recap: I want to process data via lambda events and differentiate their origin by information included in the event's json data (originating from specific tags on the S3 file for example).

What is the best way to approach this?


Solution

  • Tagging Objects

    Tagging objects depends on how they are being uploaded into S3. If you are using the CLI. After you have copied the file with aws s3 cp you can call the s3api commands to add tags.

    [aws s3api put-object-tagging --bucket \[bucket name\] --key \[object key\] --tagging 'TagSet=\[{Key=mykey,Value=myvalue},{Key=yourkey,Value=yourvalue}\]'][2]
    

    Alternatively you could add a Lambda Trigger that adds the tags to the object when uploaded. You can do this using the examples outlined here.

    Bucket Replication:

    Objects are replicated as is, you can set the encryption, type or storage or ownership. Currently you can't change anything else.

    The AWS documentation for replication defines the destination configuration as:

    {
      "AccessControlTranslation" : AccessControlTranslation,
      "Account" : String,
      "Bucket" : String,
      "EncryptionConfiguration" : EncryptionConfiguration,
      "StorageClass" : String
    }
    

    Currently you can only set the destination StorageClass, Bucket, Account and Configuration. The bucket is just the bucket name, and does not include a prefix.

    If the correct permissions are set replication can replicate tags, tags can be added at anytime. i.e you can add an object, it can replicate, and then you can update the source tag, and that source tag will replicate.

    Note: If you update the destination objects tags, and the source updates the source will override the destination tags. This is dependent on the IAM policy defined. i.e. if ownership has changed then you might not be-able to update the tags.

    AWS S3 does not have the concept of folders, the prefixes are just part of the key name, and so the entire key name is replicated.

    Possible Solutions:

    In the source bucket you could set a prefix for example 'my-source', and then for replication to the target bucket filter for the prefix 'my-source'. S3 replication will replicate the object to the target bucket with the prefix 'my-source'. Thus if bucket 1 is prefixed 'my-source1/object' and bucket 2 is prefixed 'my-source2/object'. Then the target bucket will show the "folders" 'my-source1' and 'my-source2' with their respected objects. But if both source buckets have the same prefix then the files will appear in the same "folder" on the target.

    Alternatively you can use Lambda to change the prefix, or add tags as defined above.