Docker container exits (code 255) with error "task already exists" and does not restart automatically

I have a basic container that opens up a ssh tunnel to a machine.

Recently I noticed the container has exited with error code 255 with an error message saying the task already exists:

        "Id": "7eb92418992a1a1c3e44d6b47257dc503d4fa4d0f26050956533d617ac369479",
        "Created": "2022-08-29T18:19:41.286843867Z",
        "Path": "sh",
        "Args": [
            "-c",
            "apk update && apk add openssh-client &&\n       chmod 400 ~/.ssh/abc.pem\n       while true; do \n       exec ssh -o StrictHostKeyChecking=no  -i ~/.ssh/abc.pem -nNT -L *:33333:localhost:5001 [email protected]; \n       done"
        ],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 255,
            "Error": "task 7eb92418992a1a1c3e44d6b47257dc503d4fa4d0f26050956533d617ac369479: already exists",
            "StartedAt": "2022-08-30T19:43:58.575463029Z",
            "FinishedAt": "2022-08-30T19:51:23.511624168Z"
        },

More importantly even though the restart policy is always, the docker engine did not start the container after the container exit.

  abc:
    container_name: abc
    image: alpine:latest
    restart: always
    command: > 
      sh -c "apk update && apk add openssh-client &&
             chmod 400 ~/.ssh/${PEM_FILENAME}
             while true; do 
             exec ssh -o StrictHostKeyChecking=no  -i ~/.ssh/${PEM_FILENAME} -nNT -L *:33333:localhost:5001 abc@${IP}; 
             done"
    volumes:
      - ./ssh:/root/.ssh:rw
    expose:
      - 33333

Does anyone know under what situation error task already exists can happen?
Also any idea why docker engine did not start the container after exit?

Update 1:

Also any idea why docker engine did not start the container after exit? [Answered by @Mihai] According to Restart policy details:

A restart policy only takes effect after a container starts successfully. In this case, starting successfully means that the container is up for at least 10 seconds and Docker has started monitoring it. This prevents a container which does not start at all from going into a restart loop.

Sine we have:

            "StartedAt": "2022-08-30T19:43:58.575463029Z",
            "FinishedAt": "2022-08-30T19:51:23.511624168Z"

then FinishedAt - StartedAt ~ 8 seconds < 10 seconds that's why docker engine is not restarting the container. Which I think it is not a good logic. docker engine should have a retry mechanism to retry for instance at least 3 times before giving up.

Solution

I would suggest this solution:

create Dockerfile in an empty folder as:

FROM alpine:latest
RUN apk update && apk add openssh-client

build the image:

docker build -t alpinessh .

Run it with docker run:

docker run -d \
  --restart "always" \
  --name alpine_ssh \
  -u $(id -u):$(id -g) \
  -v $HOME/.ssh:/user/.ssh \
  -p 33333:33333 \
  alpinessh \
  ssh -o StrictHostKeyChecking=no  -i /user/.ssh/${PEM_FILENAME} -nNT -L :33333:localhost:5001 abc@${IP}

(make sure to set the env variables that you need)

Running with docker-compose follows the same logic.

** NOTE **

Mapping ~/.ssh inside the container is not the best of ideas. It would be better to copy the key to a different location and use it from there. Reason is: inside the container you are root and any files created in your ~/.ssh by the container would be created/accessed by root (uid=0). For example known_hosts - if you don't already have one, you will get a fresh new one owned by root.

For this reason I am running the container as the current UID:GID on the host.