I am encountering an issue with my ECS service where tasks are consistently failing during deployment. The specific error message I receive is as follows:
ResourceInitializationError: failed to download env files: file download command: non empty error stream: service call has been retried 5 time(s): RequestCanceled: request context canceled caused by: context deadline exceeded
If you have any opinions on this issue or need additional information, please let me know immediately.
Please note that I am not a native English speaker, so I apologize for any lack of clarity in my explanation. I appreciate your understanding.
I suspect that the issue might be related to timeout settings, as the error indicates that the request is canceled after multiple retries due to a context deadline being exceeded. I have tried setting the startTimeout and stopTimeout in the task definition JSON to 120 seconds, but this has not resolved the issue.
Previously, we experienced similar issues, but they were automatically resolved through the retry process, so we did not make any changes. However, after adding a new scheduled task, this task did not operate successfully.
In an attempt to address this, as mentioned above, I revised the task definition after registering the new scheduled task and deployed it. Unfortunately, the service deployment also failed with the same error. Ultimately, I rolled back to a previous revision by removing the timeout settings from the JSON, which allowed the service deployment to proceed. However, the scheduled task still fails with the same error.
My task's VPC had a private subnet set up without a NAT Gateway, so it couldn't access the S3 bucket. After creating a NAT Gateway and adding a new route to the existing routing table for the NAT Gateway, the issue was resolved.
I hope it helps someone who faces same issues.