docker dockerfile google-cloud-build cloudbuild.yaml

Google cloud build - cache and optimize build time

I'm using google cloud build and a new build is triggered each time a new commit is pushed to my branch.

Everything is working fine (artifacts and cloud run machine) but I'm trying to speed up the build time.

I've seen a lot of tutorial and SO answers but I can't figure out how to optimize in my case: I'm using cloudbuild.yaml file and Docker, both visibile here:

CLOUDBUILD.YAML

steps:
  - name: gcr.io/cloud-builders/docker
    args:
      - build
      - '-f'
      - 'Dockerfile'
      - '-t'
      - '${_IMG_REGION}-docker.pkg.dev/$PROJECT_ID/${_IMG_REPO}/${_IMG_NAME}:$SHORT_SHA'
      - .
  - name: gcr.io/cloud-builders/docker
    args:
      - push
      - '${_IMG_REGION}-docker.pkg.dev/$PROJECT_ID/${_IMG_REPO}/${_IMG_NAME}:$SHORT_SHA'
  - name: gcr.io/google.com/cloudsdktool/cloud-sdk
    args:
      - run
      - deploy
      - '${_RUN_SERVICE}'
      - '--image=${_IMG_REGION}-docker.pkg.dev/$PROJECT_ID/${_IMG_REPO}/${_IMG_NAME}:$SHORT_SHA'
      - '--region=${_RUN_REGION}'
      - '--service-account=${_ACCOUNT_EMAIL}'
      - '--platform=managed'
      - '--ingress=internal-and-cloud-load-balancing'
      - '--concurrency=200'
      - '--timeout=600'
      - '--min-instances=1'
      - '--max-instances=2'
      - '--cpu=1'
      - '--memory=256Mi'
      - '--cpu-boost'
      - '--cpu-throttling'
    entrypoint: gcloud
images:
  - '${_IMG_REGION}-docker.pkg.dev/$PROJECT_ID/${_IMG_REPO}/${_IMG_NAME}:$SHORT_SHA'

DOCKERFILE

FROM node:20.5.1-alpine AS step1
LABEL version="23.10.01"
WORKDIR /temp
COPY package.json .
RUN npm install
COPY ./tsconfig.json .
COPY src/ ./src
RUN npm run build

FROM node:20.5.1-alpine AS step2
WORKDIR /app/
COPY package.json .
RUN npm install --omit=dev && npm cache clean --force
COPY --from=step1 /temp/build ./build

ENV NODE_ENV production
ENV APP_ENV dev

EXPOSE $PORT
CMD [ "npm", "run", "start:prod" ]

Every time the new build start, the entire process is repeated and last more than 10 minutes, when in my local machine, using cache, it requires no more than 10 seconds.

Solution

First of all, it would be good to see which step is taking a long time and then fix the problem. Since I have no idea which step in your build is taking a long time, I can only share some of my observations about optimizing builds in cloudbuild.

Your build pipeline can be improved right here and now. Briefly about the improvement in this case:

you can save node preparing cache in Google Cloud Storage (this procedure can be too expensive in the context of data storage, and I recommend dealing with the issue of pipeline caching already after cloudbuild.yaml performs its main task - successfully builds the project);
move npm install and npm build stages from Dockerfile to cloudbuild.yaml;
use parallel steps (documentation), but remember that the default cloud build computer has only 2 cores, type e2-standard-2, but this can be configured (documentation);
maybe merge "build" and "push" steps for Docker (it can both reduce the construction time and increase it - it requires experiments in each specific case);
take a lighter builder (the documentation says that you can use not only official builders or community builders, but also any containers that fit the task);
use Docker caching (Docker documentation or this rotten documentation);
use only one Docker image push method!

Some thoughts on point 7.

! In general, you don't need to use the "push" step because the "images" section is specified at the end of cloudbuild.yaml, which means that all images specified there will be automatically uploaded to the artifact registry. That is, now your cloudbuild.yaml duplicates the download of the created Docker image to the artifact registry. You can choose one of these methods: either a separate "push" step or the "images" section at the end of cloudbuild.yaml. However, I also use both methods at the same time. I don't see much difference in build time with one or two ways of pushing Docker images. Why I use both methods: The two methods are not identical. The manual method with a separate "push" step gives more control over the tags of the created image. And the method using the "images" section adds more meta information to the image when pushing it to the artifact registry, such as a build id.

In the following cloudbuild.yaml I will try to demonstrate my suggestions:

options:
  # choose a slightly better type of machine, and your pipeline will already work faster
  machineType: E2_HIGHCPU_8 # COAST  $0.016 / build-minute
  dynamic_substitutions: true
  substitution_option: 'ALLOW_LOOSE'

substitutions:
  _ACCOUNT_EMAIL: 'right_project_number@cloudbuild.gserviceaccount.com' 
  _RUN_SERVICE: 'service_name'
  _RUN_REGION: 'some-run-region'
  _IMG_REGION: 'some-image-region'
  # I'm not sure if you can reuse substitutions within substitutions.
  # I didn't find any information about this in the documentation.
  # And experiments with this were unsuccessful.
  _IMAGE: 'some-image-region-docker.pkg.dev/${PROJECT_ID}/repo-name/image-name'
  _LATEST_TAG_NAME: 'dev-latest' # this can be changed depending on the environment
  

steps:

  # retrieve the stored node provisioning cache from a previous cloud build attempt
  - id: get-cache
    waitFor: [ '-' ]
    name: 'gcr.io/cloud-builders/gsutil' 
    # This approach should work for unpacking the local npm package cache.
    # For global packages, npm will have to mount volume for each step
    # that will use them, or think something around NODE_PATH.
    script: |
      #!/usr/bin/env bash
      set -Eeuo pipefail
      gsutil -m cp gs://your-bucket-name/some-folder/cache/cache.tar.gz /tmp/cache.tar.gz || echo "Cache archive not found!"
      tar -xzf /tmp/cache.tar.gz || echo "Cache archive not found!"

  # some steps that process the code: compile, test, package.
  - id: node-install
    waitFor: [ 'get-cache' ]
    name: 'node:20.5.1-alpine'
    script: |
      #!/usr/bin/env sh
      set -euo
      echo "You are in '/workdir' which contains files from your repo!"
      npm install
      echo "You can save some special results in '/workdir'."
      echo "And you'll use them later."

  # the previous step can have the same short syntax 
  # if you don't need to use any script
  - id: node-build
    waitFor: [ 'node-install' ]
    name: 'node:20.5.1-alpine'
    args: [ 'npm', 'run', 'build' ]

  # Save cache of node packages in Google Cloud Storage
  - id: cache-node-dependencies
    name: 'gcr.io/cloud-builders/gsutil'
    waitFor: [ 'node-build' ]
    script: |
      #!/usr/bin/env bash
      set -Eeuo pipefail
      tar -czf /tmp/cache.tar.gz ./node_modules &&
      gsutil -m cp /tmp/cache.tar.gz gs://your-bucket-name/some-folder/cache/cache.tar.gz

  # some independent step that can be executed in parallel
  - id: independet-step1
    waitFor: [ '-' ]
    name: 'ubuntu'
    args: ['echo', 'hello world1']
  - id: independet-step2
    waitFor: [ '-' ]
    name: 'ubuntu'
    args: ['echo', 'hello world2']

  # Build and push Docker image in one step
  - id: build-and-push-new-container
    name: 'docker:rc-cli' # Or another official lightweight Docker image
    waitFor: [ 'node-build', 'independet-step1', 'independet-step2' ]
    args: [ 'build', '--push', # the third way to push the built image
            '-t', '${_IMAGE}:${SHORT_SHA}', # all tags will be pushed
            '-t', '${_IMAGE}:${BRANCH_NAME}',
            '-t', '${_IMAGE}:${_LATEST_TAG_NAME}',
            '--cache-from', '${_IMAGE}:${_LATEST_TAG_NAME}',
            '--build-arg', 'BUILDKIT_INLINE_CACHE=1', # enable cache, start working on next build attempt
            '--progress', 'plain', # it gives you more details about the image building process
            '-f', 'Dockerfile', # RAW use, remove it
            '.' ]

  # Deploy to cloudrun
  - id: cloudrun-deploy
    waitFor: [ 'build-and-push-new-container' ]
    name: gcr.io/google.com/cloudsdktool/cloud-sdk:alpine
    automapSubstitutions: true
    script: |
      #!/usr/bin/env bash    
      set -Eeuo pipefail
      gcloud run deploy ${_RUN_SERVICE} \
      --image=${_IMAGE}:${SHORT_SHA} \
      --region=${_RUN_REGION} \
      --service-account=${_ACCOUNT_EMAIL} \
      --platform=managed \
      --ingress=internal-and-cloud-load-balancing \
      --concurrency=200 \
      --timeout=600 \
      --min-instances=1 \
      --max-instances=2 \
      --cpu=1 \
      --memory=256Mi \
      --cpu-boost \
      --cpu-throttling


# This adds more metadata to the image in the artifact registry
# But even without this section, the Docker image will be pushed.
#images: [ '${_IMAGE}' ]

P.S. Perhaps it was worth writing a separate example for each point. But it seems to me that a single cloudbuild.yaml would be more useful.