Search code examples
amazon-s3aws-lambdaamazon-transcribeaws-permissions

Access denied when getting transcription


My setup is the following:

React-native app client -> AWS API Gateway -> AWS Lambda function -> AWS S3 -> AWS Transcribe -> AWS S3

I am successfully able to upload an audio file to an S3 bucket from the lambda, start the transcription and even access it manually in the S3 bucket. However when I try to access the json file with the transcription data using TranscriptFileUri I am getting 403 response.

On the s3 bucket with the transcriptions I have the following CORS configuration:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "PUT",
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "ETag"
        ]
    }
]

My lambda function code looks like this:

response = client.start_transcription_job(
        TranscriptionJobName=jobName,
        LanguageCode='en-US',
        MediaFormat='mp4',
        Media={
            'MediaFileUri': s3Path
        },        
        OutputBucketName = 'my-transcription-bucket',
        OutputKey = str(user_id) + '/'
    )
    
    while True:
        result = client.get_transcription_job(TranscriptionJobName=jobName)
        if result['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
        time.sleep(5)
        
    if result['TranscriptionJob']['TranscriptionJobStatus'] == "COMPLETED":
        data = result['TranscriptionJob']['Transcript']['TranscriptFileUri']
        data = requests.get(data)
        print(data)

In Cloudwatch I get the following: <Response [403]> when printing the response.


Solution

  • As far as I can tell, your code is invoking requests.get(data) where data is the TranscriptFileUri. What does that URI look like? Is it signed? If not, as I suspect, then you cannot use requests to get the file from S3 (it would have to be a signed URL or a public object for this to work).

    You should use an authenticated mechanism such as get_object.