I'm having trouble connecting to a fargate container with the ECS Execute command and it gives out the following error:
An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.
I've made sure I have the right permissions and setup by using ecs-checker and I'm connecting to it using the following command:
aws ecs execute-command --cluster {cluster-name} --task {task_id} --container {container name} --interactive --command "/bin/bash".
I've noticed that this can usually happen when you don't have the necessary permissions but as I've pointed out above I've already checked with the ecs-checker.sh and here is the output from it:
-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
AWS CLI Version | OK (aws-cli/2.13.4 Python/3.11.4 Darwin/22.4.0 source/arm64 prompt/off)
Session Manager Plugin | OK (1.2.463.0)
-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-west-2
Cluster: cluster
Task : 47e51750712a4e1c832dd996c878f38a
-------------------------------------------------------------
Cluster Configuration | Audit Logging Not Configured
Can I ExecuteCommand? | arn:aws:iam::290319421751:role/aws-reserved/sso.amazonaws.com/eu-west-2/AWSReservedSSO_PowerUserAccess_01a9cfdb5ba4af7f
ecs:ExecuteCommand: allowed
ssm:StartSession denied?: allowed
Task Status | RUNNING
Launch Type | Fargate
Platform Version | 1.4.0
Exec Enabled for Task | OK
Container-Level Checks |
----------
Managed Agent Status
----------
1. RUNNING for "WebApp"
----------
Init Process Enabled (WebAppTaskDefinition:49)
----------
1. Enabled - "WebApp"
----------
Read-Only Root Filesystem (WebAppTaskDefinition:49)
----------
1. Disabled - "WebApp"
Task Role Permissions | arn:aws:iam::290319421751:role/task-role
ssmmessages:CreateControlChannel: allowed
ssmmessages:CreateDataChannel: allowed
ssmmessages:OpenControlChannel: allowed
ssmmessages:OpenDataChannel: allowed
VPC Endpoints |
Found existing endpoints for vpc-11122233444:
- com.amazonaws.eu-west-2.monitoring
- com.amazonaws.eu-west-2.ssmmessages
Environment Variables | (WebAppTaskDefinition:49)
1. container "WebApp"
- AWS_ACCESS_KEY: not defined
- AWS_ACCESS_KEY_ID: not defined
- AWS_SECRET_ACCESS_KEY: not defined
What is weird about this situation is that there are 4 environments that the service is deployed to and it works on all of them except on one of them. And they are all the same resources deployed since the clusters are created through a cloudformation template. The image deployed is also the same in all 4 environments.
Any ideas on what could cause this?
It seems there was a VPC access point setup for SSM in that environment which was not needed in our case since the tasks already had pubilc network access.
Weirdly enough, when we removed the VPC endpoint the problem went away. It might have been not set up correctly with the VPC endpoint security groups and so If you have a situation similar to this one I encourage you to check If you have misconfigured VPC endpoints for SSM and remove or fix them depending on your use case.