aws-api-gateway amazon-cognito terraform terraform-provider-aws

AWS Cognito Authorizer Error 500 Execution failed due to an internal error - null value in JWT claim

I'm having an issue with a cognito authorizer and have run out of testing options (that I can think of), so wondered if anyone had experienced similar issues. I've searched the forums and previous instances seem to have been related to an AWS incident and have "resolved themselves". My issue has been going on for over a week.

I have 3 Cognito User Pools built using Terraform (sorry Cloud Formation) and attached to different REST APIs as Cognito Authorizers in API Gateway. I have another 3, almost identical (with the exception of names) attached to another 3 APIs.

If I take a valid JWT, obtained using AWS Amplify (or using the Cognito API directly) and either test the authorizer using the console, test the authorizer using the CLI or make an API request to an end point with auth turned on, I get the following:

{
    "clientStatus": 500,
    "log": "Execution failed due to an internal error",
    "latency": 28
}

I've turned API Gateway logging on and it provides little insight:

00:17:50 (63aac040-e610-11e8-a304-1dab6e773ddd) Extended Request Id: QOP7MG0ELPEFUBg=
00:17:50 (63aac040-e610-11e8-a304-1dab6e773ddd) Starting authorizer: x1rebc for request: 63aac040-e610-11e8-a304-1dab6e773ddd
00:17:50 (63aac040-e610-11e8-a304-1dab6e773ddd) Execution failed due to an internal error
00:17:50 (63aac040-e610-11e8-a304-1dab6e773ddd) Gateway response type: DEFAULT_5XX with status code: 500
00:17:50 (63aac040-e610-11e8-a304-1dab6e773ddd) Gateway response body: {"message":null}
00:17:50 (63aac040-e610-11e8-a304-1dab6e773ddd) Gateway response headers: {Access-Control-Allow-Origin=*, Access-Control-Allow-Headers=Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token, Access-Control-Allow-Methods=GET,OPTIONS}

If I copy the Terraform script and deploy another User Pool and Authorizer, then attach that to the broken API end point, then all is fine. If I attach one of the other 3 authorizers that are already deployed to the broken API end point then all is fine.

If I attach the authorizer from a broken endpoint to another API end point that is working and has auth turned on (in another API that is working with the working authorizer) then that API end point breaks... so this suggests to me that it is a Cognito issue, and one I can't get any logs about!

If it were almost any other AWS resource I'd bin it, redeploy and start again. However, understanding the root cause of this is quite important to me, since binning a production user pool and all of the users and their details, which can't be exported or migrated (as far as I know) AND reconfiguring a web app to use new Cognito App and User Pool IDs which can't be statically mapped (as far as I know) isn't something I want to risk in a production environment.

Any further info or pointers would be very much appreciated! Thanks,

Tom

Solution

If anyone comes across this, I have just got to the bottom of the issue (having worked around it months ago).

I was creating a JWT claim with a null value in the pre token generation trigger, e.g.

{ "field": null }

Cognito is fine with this and sends the token back when you log in, complete with this null value. When you use it to log in however, it falls over and gives you a mysterious "internal error" with no details.

Changing to the following fixes the problem:

{ "field" "null" }