Search code examples
dockergitlab-citestcontainersdocker-in-docker

Could not find valid Docker environment when running java TestContainers in GitLab-CI and pulling Docker:dind from private registry


I am trying to run Java TestContainers from GitLab-CI, using Docker-in-Docker. For security reasons, I am only allowed to use images replicated to internal enterprise container registry.

When I run the job with the following configuration:

test:
  image: internalregistry.azurecr.io/maven:3.9-eclipse-temurin-17
  services:
    - docker:24.0.5-dind
  stage: test
  variables:
    DOCKER_HOST: tcp://docker:2375
    DOCKER_TLS_CERTDIR: ""
    DOCKER_DRIVER: overlay2
  before_script:
    - echo "hub.image.name.prefix=internalregistry.azurecr.io/" > $HOME/.testcontainers.properties
  script:
    - mvn clean test

the job works, tests are successfully executed and pass.

However, I created an internal variation of the image:

docker pull docker:24.0.5-dind
docker tag docker:24.0.5-dind internalregistry.azurecr.io/docker:24.0.5-dind
docker push internalregistry.azurecr.io/docker:24.0.5-dind

and when I change the docker service to point to internal registry:

  services:
    - internalregistry.azurecr.io/docker:24.0.5-dind

the pipeline no longer suceeds and fails with the following error:

Could not find a valid Docker environment. Please check configuration. Attempted configurations were:\n\tUnixSocketClientProviderStrategy: failed with exception InvalidConfigurationException (Could not find unix domain socket). Root cause NoSuchFileException (/var/run/docker.sock)As no valid configuration was found, execution cannot continue.\nSee https://www.testcontainers.org/on_failure.html for more details.

During valid execution I noticed the following TestContainers strategy is used:

logger":"org.testcontainers.dockerclient.DockerClientProviderStrategy","message":"Found Docker environment with Environment variables, system properties and defaults. Resolved dockerHost=tcp://docker:2375"

I tried configuring the same strategy for the GitLab-CI:

  before_script:
    - echo "hub.image.name.prefix=internalregistry.azurecr.io/" > $HOME/.testcontainers.properties
    - echo "docker.client.strategy=org.testcontainers.dockerclient.DockerClientProviderStrategy" >> $HOME/.testcontainers.properties

Unfortunatelly, even this configuration fails with the following error:

logger":"org.testcontainers.dockerclient.DockerClientProviderStrategy","message":"Can't instantiate a strategy from org.testcontainers.dockerclient.DockerClientProviderStrategy"

with the following underlying message:

{"timestamp":"2023-12-19T09:02:21.104Z","level":"WARN","thread":"main","logger":"org.testcontainers.dockerclient.DockerClientProviderStrategy","message":"DOCKER_HOST tcp://docker:2375 is not listening","context":"default"}

{"timestamp":"2023-12-19T09:02:21.280Z","level":"INFO","thread":"main","logger":"org.testcontainers.dockerclient.DockerMachineClientProviderStrategy","message":"docker-machine executable was not found on PATH ([/opt/java/openjdk/bin, /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, /bin])","context":"default"}

{"timestamp":"2023-12-19T09:02:21.281Z","level":"ERROR","thread":"main","logger":"org.testcontainers.dockerclient.DockerClientProviderStrategy","message":"Could not find a valid Docker environment. Please check configuration. Attempted configurations were:\n\tUnixSocketClientProviderStrategy: failed with exception InvalidConfigurationException (Could not find unix domain socket). Root cause NoSuchFileException (/var/run/docker.sock)As no valid configuration was found, execution cannot continue.\nSee https://www.testcontainers.org/on_failure.html for more details.","context":"default"}


Solution

  • The problem

    As per official documentation, when defining services, a service is registered after a name is automatically inferred from the provided image:

    The default aliases for the service’s hostname are created from its image name following these rules:

    • Everything after the colon (:) is stripped.
    • Slash (/) is replaced with double underscores (__) and the primary alias is created.
    • Slash (/) is replaced with a single dash (-) and the secondary alias is created (requires GitLab Runner v1.1.0 or later).

    This means, when you used the docker:24.0.5-dind from public hub, the created alias was docker. However, when you started service from image within your internal registry, the created aliases were the following:

    1. internalregistry.azurecr.io-docker,
    2. internalregistry.azurecr.io__docker.

    Now, when you define the DOCKER_HOST variable:

    DOCKER_HOST: tcp://docker:2375
    

    you are trying to access a service named docker, but in your case, the only service names available are those mentioned above:

    1. internalregistry.azurecr.io-docker,
    2. internalregistry.azurecr.io__docker.

    The solution

    You can either reconfigure your DOCKER_HOST variable to use the automatically inferred alias:

    DOCKER_HOST: tcp://internalregistry.azurecr.io-docker:2375
    

    or (and I would recommend this solution) set an alias, docker, that way the DOCKER_HOST remains the same and the service created from image pulled from your internal registry is registered under the value provided in the alias directive:

      services:
        - name: internalregistry.azurecr.io/docker:24.0.5-dind
          alias: docker
    

    Naturally, the value specified in alias needs to match the host specified in the DOCKER_HOST (or any for that matter) env variable, and vice versa.

    Also ensure the image is actually the same (i.e. same SHA256 digest) as docker:24.0.5-dind