Search code examples
dockergithub-actionsgithub-actions-runners

Self hosted GitHub Action Runner jobs failing


We make use of Github Self-Hosted action runners running on EC2 machines (m5.xlarge). We use these as part of our CI/CD pipeline to support docker image builds and automated testing. This solution has worked fine for the last year or so, but all of a sudden yesterday, the builds started to fail with the following error message :

time="2023-02-03T12:00:13Z" level=error msg="error waiting for container: unexpected EOF"

My understanding of this is that it is typically due to docker containers running out of resources (CPU / Memory Limit) being hit but given that these are m5.xlarges (4 vCPU and 16GB Memory) I'm a little surprised. Our builds make use of NPM which I understand can be quite resource hungry but monitoring a container during its execution showed that it was nowhere near the limits of the node:

Image

I've tried to cycle the nodes but there is no difference in behaviour. The following user-data script is used with these nodes which connects it to our Github account and makes it available for jobs. I've also tried using the latest actions-runneer package, but again, no change in behaviour. What other reasons could this error be thrown for as i'm a bit stumped by this.

#!/bin/sh
set -e

curl https://get.docker.com | bash
apt install -y python3-pip jq
pip3 install awscli

mkdir actions-runner && cd actions-runner
curl -O -L https://github.com/actions/runner/releases/download/v2.286.0/actions-runner-linux-x64-2.286.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.286.0.tar.gz
chown -R ubuntu:ubuntu .

instance_id="$(curl -s http://169.254.169.254/latest/meta-data/instance-id)"

url="https://api.github.com/orgs/<REMOVED>/actions/runners/registration-token"
token=$(curl -s -u "<REMOVED>:<REMOVED>" -X POST "$url" | jq -r .token)

sudo -u ubuntu ./config.sh \
  --name "products-stage-ec2-runner-$instance_id" \
  --token "$token" \
  --url "https://github.com/<REMOVED>" \
  --labels "<REMOVED>" \
  --unattended

sudo ./svc.sh install
sudo ./svc.sh start

Solution

  • See details on my comment of how I resolved this.