we have the following scenario: AWS Account A (application) writes data from an application to an S3 bucket owned by account B (data lake). The analysts in account C (reporting) want to proccess the data and build reports and dashboards on top of it.
Account A can write data to the data lake with --acl bucket-owner-full-control
to allow Account B the access. But Account C still cannot see and process the data.
One (in our eyes bad) solution is to copy the data to the same location (overwrite) as account B, effectively taking ownership for the data in the process and eliminating the issue. We don't want it, because ... ugly
We tried assuming roles in the different accounts, but it does not work for all our infrastructure. E.g. S3 access via CLI or console is OK, but using it from EMR in account C does not. Also we have on-premise infrastructure (local taskrunners), where this mechanism is not an option.
Maintaining IAM roles for all accounts and users is too much effort. We aim for an automatic solution, not one that we have to take action every time a new user or account is added.
Do you have any suggestions?
In our case, we solved it using roles in the DataLake account (B), both for write (WriterRole) and read (ReaderRole) access. When writing to the DataLake from Account A, your writer assumes the "WriterRole" in Account B, that has the required permission. When reading from Account C, you assume the "ReaderRole". The issues with EMR reading, we solved with EMRFS using IAM roles for reading (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html)