SSH_AUTH_SOCK not working in a docker container run inside a Jenkins worker pod (Docker in docker)

I have a Jenkins pipeline where I need to clone a repo from Github using ssh within a docker container. In this case, the Jenkins worker is a Kubernetes pod. So, in order to use SSH to clone the git repo, I mount the SSH_AUTH_SOCK to the docker container. Below is my Jenkins pipeline. (The below sample is written just for checking if the ssh-key is available within the container and it's not the actual use-case).

pipeline {
  agent {
    kubernetes {
      yaml '''
      apiVersion: v1
      kind: Pod
      spec:
        containers:
        - name: alpine
          image: pgedara/alpine-docker:3.0
          command:
            - cat
          tty: true
          imagePullPolicy: Always
          securityContext:
           privileged: true
          volumeMounts:
          - name: docker-socket
            mountPath: /var/run
        - name: docker-daemon
          image: docker:dind
          securityContext:
           privileged: true
          volumeMounts:
           - name: docker-socket
             mountPath: /var/run
        volumes:
        - name: docker-socket
          emptyDir: {}
            '''
    }
  }
  stages {
    stage('ssh-auth-sock-test') {
        steps {
            container ('alpine') {
                sshagent(['my-ssh-key']) {
                      sh (script: "docker run -v ${SSH_AUTH_SOCK}:/ssh_auth_sock -e SSH_AUTH_SOCK=/ssh_auth_sock pgedara/alpine-git:5.0 ssh-add -L")
                  }
            }
        }
    }
  }
}

So, when I run this, I get the below error, soon after the docker image pgedara/alpine-git is downloaded.

Status: Downloaded newer image for pgedara/alpine-git:5.0
Error connecting to agent: Connection refused

Howerver, if I replace the "kubernetes" block inside the agent block with a label in order to run the job on an EC2 worker, "ssh-add -L" prints out the public ssh-key.

pipeline {
  agent {
    label 'ubuntu' // This spins an EC2 instance as a worker node
  }
  stages {
    stage('ssh-auth-sock-test') {
        steps {
            //container ('alpine') {
                sshagent(['my-ssh-key']) {
                      sh (script: "docker run -v ${SSH_AUTH_SOCK}:/ssh_auth_sock -e SSH_AUTH_SOCK=/ssh_auth_sock pgedara/alpine-git:5.0 ssh-add -L")
                  }
           //}
        }
    }
  }
}

Below is the Dockerfile for pgedara/alpine-docker:3.0

FROM alpine
RUN apk add --no-cache --update bash docker openssh sudo acl git
RUN rm -f /etc/ssh/ssh_config
ADD config ~/.ssh
ADD ssh_config /etc/ssh/
RUN mkdir -p  ~/.ssh && ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts

Below is the content of both config and ssh_config files

Host *
  ForwardAgent yes

Below is the Dockerfile for pgedara/alpine-git:5.0

FROM alpine
RUN apk add --no-cache --update git openssh sudo bash
RUN mkdir -p  ~/.ssh && ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts

So, I would appreciate if someone could explain why I get the above mentioned error when I am using a pod template for the Jenkins worker. If anybody needs more details to understand my problem, please let me know.

Thanks!

Solution

I had the same problem with a similar setup, and might have the solution. After doing some digging, I found that the path being created for SSH_AUTH_SOCK was under the /tmp directory. However, that directory isn't shared between the two containers in your Pod declaration. If you set up another emptyDir volume mounted at /tmp in both containers similar to what you've done for the docker-socket mount, things might start working.

Something else that helped me figure this out was switching from -v to --mount for the docker run volume mount option. -v will silently create a new volume for a specified path which does not exist, but that path will not be a socket. --mount helpfully reports an error when a source path does not exist.