Search code examples
dockerkubernetesdocker-registrygoogle-container-registrydocker-in-docker

`docker push` to gcr.io fails in Kubernetes + Docker-in-docker + user-defined docker network


Background:

I'm using Drone to test an application. Drone is deployed to Kubernetes, with with a docker (dind / docker-in-docker) container side-carred.

After the test completes, I use drone again to build & push several docker images of about ~40mb each to us.gcr.io

When Drone creates the docker container to test my application, and the separate container to build my application and images, it creates a docker network to link the containers to build services, like a temporary test database (pretty standard in a CI pipeline).

However, the combination of Kubernetes pod networking, and Docker-in-Docker results in the following when trying to push to gcr:

time="2018-03-19T03:31:12.037507241Z" level=error msg="Upload failed, retrying: net/http: HTTP/1.x transport connection broken: write tcp w.x.y.z:39662->z.y.x.w:443: write: broken pipe"
time="2018-03-19T03:31:17.208009069Z" level=error msg="Upload failed, retrying: net/http: HTTP/1.x transport connection broken: write tcp w.x.y.z:39662->z.y.x.w:443: write: broken pipe"
time="2018-03-19T03:31:17.216232506Z" level=error msg="Upload failed, retrying: net/http: HTTP/1.x transport connection broken: write tcp w.x.y.z:39662->z.y.x.w:443: write: broken pipe"
time="2018-03-19T03:31:17.407608372Z" level=error msg="Upload failed, retrying: net/http: HTTP/1.x transport connection broken: write tcp w.x.y.z:39662->z.y.x.w:443: write: broken pipe"
time="2018-03-19T03:31:17.410403394Z" level=error msg="Upload failed, retrying: net/http: HTTP/1.x transport connection broken: write tcp w.x.y.z:39662->z.y.x.w:443: write: broken pipe"
time="2018-03-19T03:31:23.432621075Z" level=error msg="Upload failed, retrying: unexpected EOF"

However, when pushing to (what I assume is) an older registry version, then it works perfectly.

When pushing to gcr while there is no docker container networking enabled, then it also works perfectly.

Here are the docker commands being ran. Obviously the sensitive data has been omitted.

docker network create test-network && \
docker run --network=test-network -d cockroachdb/cockroach:v1.1.2 -c /cockroach sql --insecure && \
docker run --rm -it -e GKE_CLUSTER_NAME=my-cluster-1 -e GKE_CLUSTER_ZONE=us-east1-b -e GCP_PROJECT=my-gcp-project -e DOCKER_USE_GCP=true -v /var/run/docker.sock:/var/run/docker.sock --network=test-network us.gcr.io/my-project/runner /bin/sh -c 'mkdir -p src/git.example.com/project && git clone https://user:[email protected]/project/project $GOPATH/src/git.example.com/project/project && cd $GOPATH/src/git.example.com/project/project && git checkout gcr && jules -stage deploy_docker'

The jules -stage deploy_docker command runs a go build, docker build, and then gcloud docker -- push... on 8 different directories simultaneously.

So, summary:

Kubernetes pod + docker-in-docker + gcloud docker push results in a consistently interrupted connection.

Is there something I could do with docker daemon or kubernetes network settings or something to mitigate this? At the very least I want to understand why this is happening.

Thanks!


Update:

This doesn't even require Kubernetes to happen!

I just tried it with a fresh GCE instance running Ubuntu and it happens there, too.


Solution

  • I contacted GCR support about this issue, as it seemed to only happen with GCR, and they informed me that the IAM account that was attempting to push to the registry was actually the default service account for GCE instances, and not the account that I provided to my Dockerfile.

    However, that did not explain the "Broken pipe" and "EOF" errors when I should have been getting 401 - Unauthorized.

    I attempted the same push with the google/cloud-sdk docker image here and it worked fine when I provided it the same key in a similar environment, so that told me that the way I installed gcloud on my docker image was bad.

    Here's what I had:

    RUN wget https://dl.google.com/dl/cloudsdk/channels/rapid/google-cloud-sdk.tar.gz
    RUN tar -xvf google-cloud-sdk.tar.gz
    RUN rm google-cloud-sdk.tar.gz
    RUN google-cloud-sdk/install.sh --usage-reporting=false \
      --path-update=false \
      --bash-completion=false
    
    ENV PATH="/go/google-cloud-sdk/bin:${PATH}"
    RUN gcloud components install kubectl
    RUN gcloud components install docker-credential-gcr
    

    And here's what google/cloud-sdk had. Updating my Dockerfile to install it this way fixed my problem.

    # Install gcloud
    ENV CLOUD_SDK_VERSION 193.0.0
    
    ARG INSTALL_COMPONENTS
    RUN easy_install -U pip && \
        pip install -U crcmod && \
        export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
        echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
        curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
        apt-get update && apt-get install -y google-cloud-sdk=${CLOUD_SDK_VERSION}-0 $INSTALL_COMPONENTS && \
        gcloud config set core/disable_usage_reporting true && \
        gcloud config set component_manager/disable_update_check true && \
        gcloud config set metrics/environment github_docker_image && \
        gcloud --version
    

    I'm still clueless as to why this did it for me, so if anyone has any insight that'd be great.