Search code examples
pythonamazon-web-servicesboto3aws-glue

AWS Glue returns 'ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden' when using Boto3


I am running a glue job in AWS account 'A', and accessing a file from s3 in AWS account 'B'. I have access to both accounts. When I run the code to pull said file:

bucket = 'bucket-name'
filename = 's3-file-name.rdb'
local_filename = 'temp.rdb'

s3 = boto3.client('s3')

s3.download_file(bucket, filename, local_filename) # Line 127 seen in error message

I get the following error:

  File "/tmp/myScript.py", line 127, in <module>
    s3.download_file(bucket, filename, local_filename)
  File "/home/spark/.local/lib/python3.10/site-packages/boto3/s3/inject.py", line 190, in download_file
    return transfer.download_file(
  File "/home/spark/.local/lib/python3.10/site-packages/boto3/s3/transfer.py", line 320, in download_file
    future.result()
  File "/home/spark/.local/lib/python3.10/site-packages/s3transfer/futures.py", line 103, in result
    return self._coordinator.result()
  File "/home/spark/.local/lib/python3.10/site-packages/s3transfer/futures.py", line 266, in result
    raise self._exception
  File "/home/spark/.local/lib/python3.10/site-packages/s3transfer/tasks.py", line 269, in _main
    self._submit(transfer_future=transfer_future, **kwargs)
  File "/home/spark/.local/lib/python3.10/site-packages/s3transfer/download.py", line 354, in _submit
    response = client.head_object(
  File "/home/spark/.local/lib/python3.10/site-packages/botocore/client.py", line 508, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/spark/.local/lib/python3.10/site-packages/botocore/client.py", line 915, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

I am aware of the iAM access needed, and have checked and confirmed I conform with everything mentioned in the accepted answer of boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden, however, to no avail.

I can confirm that the iAM policy attached to the glue job has s3:read, and s3:get access, and have double checked the policy on the bucket and it contains the following:

{
    "Sid": "ReadOnlyFromAccountA",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT_A_ID:root"
    },
    "Action": [
      "s3:List*",
      "s3:Get*"
    ],
    "Resource": [
      "arn:aws:s3:::required-bucket",
      "arn:aws:s3:::required-bucket/*"
    ]
}

Here is the iAM role policy attached to the glue job:

{
    "PolicyVersion": {
        "Document": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": [
                        "*",
                        "glue:*"
                    ],
                    "Effect": "Allow",
                    "Resource": "*"
                },
                {
                    "Action": [
                        "s3:DeleteObject",
                        "s3:DeleteBucket",
                        "s3:DeleteBucketPolicy",
                        "iam:DeleteUser",
                        "organizations:*"
                    ],
                    "Effect": "Deny",
                    "Resource": "*"
                }
            ]
        },
        "VersionId": "v3",
        "IsDefaultVersion": true,
        "CreateDate": "2023-09-15T12:16:58Z"
    }
}

Also, interestingly, when I run the following code locally it works without issue:

bucket = 'bucket-name'
filename = 's3-file-name.rdb'
local_filename = 'temp.rdb'

session = boto3.Session(profile_name='AWS_ACCOUNT_B')
s3 = session.client('s3')

s3.download_file(bucket, filename, local_filename)

Since I am running this locally I have to specify to use the account 'B' credentials from my ~/.aws/credentials. Again, this works fine and has no issue or error in downloading the file, leading me to believe that everything should work fine in glue since the iAM role will provide boto3 with the required credentials, since it is allowed to access account 'B'.

I am unsure why this is not working and if anyone with a bigger brain than me could figure this out it would be much appreciated. Any other info required just drop a comment and I'll respond straight away.


Solution

  • The issue here was with the setup of ownership for buckets and objects. A new account to me, Account C, was creating the file I was trying to pull, and the IAM role did not have permission to access these files since it was created with this account. New objects created in this bucket by myself and seemingly everyone else in the organisation were then created by Account D, which we did have permission to read from. This is why debugging the issue was so hard since it gave the impression that we had access and should be able to pull any file from this bucket. Updating the account used to push these files to S3 solved the issue