Search code examples
dockerdockerfilegoogle-cloud-buildkaniko

npm run build is not cached when running docker build with kaniko cache


I'm trying to speed up my Google Cloud Build for a React application (github repo). Therefor I started using Kaniko Cache as suggested in the official Cloud Build docs.

It seems the npm install part of my build process is now indeed cached. However, I would have expected that npm run build would also be cached when source files haven't changed.

My Dockerfile:

# Base image has ubuntu, curl, git, openjdk, node & firebase-tools installed
FROM gcr.io/team-timesheets/builder as BUILDER   

## Install dependencies for functions first
WORKDIR /functions
COPY functions/package*.json ./

RUN npm ci

## Install app dependencies next
WORKDIR /
COPY package*.json ./

RUN npm ci

# Copy all app source files
COPY . .

# THIS SEEMS TO BE NEVER CACHED, EVEN WHEN SOURCE FILES HAVENT CHANGED
RUN npm run build:refs \
    && npm run build:production

ARG VCS_COMMIT_ID
ARG VCS_BRANCH_NAME
ARG VCS_PULL_REQUEST
ARG CI_BUILD_ID
ARG CODECOV_TOKEN

ENV VCS_COMMIT_ID=$VCS_COMMIT_ID
ENV VCS_BRANCH_NAME=$VCS_BRANCH_NAME
ENV VCS_PULL_REQUEST=$VCS_PULL_REQUEST
ENV CI_BUILD_ID=$CI_BUILD_ID
ENV CODECOV_TOKEN=$CODECOV_TOKEN

RUN npm run test:cloudbuild \
    && if [ "$CODECOV_TOKEN" != "" ]; \
        then curl -s https://codecov.io/bash | bash -s - -X gcov -X coveragepy -X fix -s coverage; \
    fi

WORKDIR /functions

RUN npm run build

WORKDIR /

ARG FIREBASE_PROJECT_ID
ARG FIREBASE_TOKEN

RUN if [ "$FIREBASE_TOKEN" != "" ]; \
       then firebase deploy --project $FIREBASE_PROJECT_ID --token $FIREBASE_TOKEN; \
    fi

Build output:

BUILD
Pulling image: gcr.io/kaniko-project/executor:latest
latest: Pulling from kaniko-project/executor
Digest: sha256:b9eec410fa32cd77cdb7685c70f86a96debb8b087e77e63d7fe37eaadb178709
Status: Downloaded newer image for gcr.io/kaniko-project/executor:latest
gcr.io/kaniko-project/executor:latest
INFO[0000] Resolved base name gcr.io/team-timesheets/builder to builder 
INFO[0000] Using dockerignore file: /workspace/.dockerignore 
INFO[0000] Retrieving image manifest gcr.io/team-timesheets/builder 
INFO[0000] Retrieving image gcr.io/team-timesheets/builder 
INFO[0000] Retrieving image manifest gcr.io/team-timesheets/builder 
INFO[0000] Retrieving image gcr.io/team-timesheets/builder 
INFO[0000] Built cross stage deps: map[]                
INFO[0000] Retrieving image manifest gcr.io/team-timesheets/builder 
INFO[0000] Retrieving image gcr.io/team-timesheets/builder 
INFO[0000] Retrieving image manifest gcr.io/team-timesheets/builder 
INFO[0000] Retrieving image gcr.io/team-timesheets/builder 
INFO[0001] Executing 0 build triggers                   
INFO[0001] Resolving srcs [functions/package*.json]...  
INFO[0001] Checking for cached layer gcr.io/team-timesheets/app/cache:9307850446a7754b17d62c95be0c1580672377c1231ae34b1e16fc284d43833a... 
INFO[0001] Using caching version of cmd: RUN npm ci     
INFO[0001] Resolving srcs [package*.json]...            
INFO[0001] Checking for cached layer gcr.io/team-timesheets/app/cache:7ca523b620323d7fb89afdd0784f1169c915edb933e1d6df493f446547c30e74... 
INFO[0001] Using caching version of cmd: RUN npm ci     
INFO[0001] Checking for cached layer gcr.io/team-timesheets/app/cache:1fd7153f10fb5ed1de3032f00b9fb904195d4de9dec77b5bae1a3cb0409e4530... 
INFO[0001] No cached layer found for cmd RUN npm run build:refs     && npm run build:production 
INFO[0001] Unpacking rootfs as cmd COPY functions/package*.json ./ requires it. 
INFO[0026] WORKDIR /functions                           
INFO[0026] cmd: workdir                                 
INFO[0026] Changed working directory to /functions      
INFO[0026] Creating directory /functions                
INFO[0026] Taking snapshot of files...                  
INFO[0026] Resolving srcs [functions/package*.json]...  
INFO[0026] COPY functions/package*.json ./              
INFO[0026] Resolving srcs [functions/package*.json]...  
INFO[0026] Taking snapshot of files...                  
INFO[0026] RUN npm ci                                   
INFO[0026] Found cached layer, extracting to filesystem 
INFO[0029] WORKDIR /                                    
INFO[0029] cmd: workdir                                 
INFO[0029] Changed working directory to /               
INFO[0029] No files changed in this command, skipping snapshotting. 
INFO[0029] Resolving srcs [package*.json]...            
INFO[0029] COPY package*.json ./                        
INFO[0029] Resolving srcs [package*.json]...            
INFO[0029] Taking snapshot of files...                  
INFO[0029] RUN npm ci                                   
INFO[0029] Found cached layer, extracting to filesystem 
INFO[0042] COPY . .                                     
INFO[0043] Taking snapshot of files...                  
INFO[0043] RUN npm run build:refs     && npm run build:production 
INFO[0043] Taking snapshot of full filesystem...        
INFO[0061] cmd: /bin/sh                                 
INFO[0061] args: [-c npm run build:refs     && npm run build:production] 
INFO[0061] Running: [/bin/sh -c npm run build:refs     && npm run build:production] 

> [email protected] build:refs /
> tsc -p common


> [email protected] build:production /
> webpack --env=prod

Hash: e33e0aec56687788a186
Version: webpack 4.43.0
Time: 81408ms
Built at: 12/04/2020 6:57:57 AM
....

Now, with the overhead of the cache system, there doesn't even seem to be a speed benefit.

I'm relatively new to Dockerfiles, so hopefully I'm just missing a simple line here.


Solution

  • Short answer: Cache invalidation is hard.

    In a RUN section of a Dockerfile, any command can be run. In general, docker (when using local caching) or Kaniko now have do decide, if this step can be cached or not. This is usually determined by checking, if the output is deterministic - in other words: if the same command is run again, does it produce the same file changes (relative to the last image) than before?

    Now, this simplistic view is not enough to determine a cacheable command, because any command can have side-effects that do not affect the local filesystem - for example, network traffic. If you run a curl -XPOST https://notify.example.com/build/XYZ to post a successful or failed build to some notification API, this should not be cached. Maybe your command is generating a random password for an admin-user and saves that to an external database - this step also should never be cached.

    On the other hand, a completely reproducible npm run build could still result in two different bundled packages due to the way, that minifiers and bundlers work - e.g. where minified and uglified builds have different short variable names. Although the resulting builds are semantically the same, they are not on a byte-level - so although this step could be cached, docker or kaniko have no way of identifying that.

    Distinguishing between cacheable and non-cacheable behavior is basically impossible and therefore you'll encounter problematic behavior in form of false-positives or false-negatives in caching again and again.

    When I consult clients in building pipelines, I usually split Dockerfiles up into stages or put the cache-miss-or-hit-logic into a script, if docker decides wrong for a certain step.

    When you split Dockerfiles, you have a base-image (which contains all dependencies and other preparation steps) and split off the custom-cacheable part into its own Dockerfile - the latter then references the former base-image. This usually means, that you have to have some form of templating in place (e.g. by having a FROM ${BASE_IMAGE} at the start, which then is rendered via envsubst or a more complex system like helm).

    If that is not suitable for your usecase, you can choose to implement the logic yourself in a script. To find out, which files change, you can use git diff --name-only HEAD HEAD~1. By combining this with some more logic, you can customize your script behavior to only perform some logic if a certain set of files changed:

    #!/usr/bin/env bash
    # only rebuild, if something changed in 'app/'
    if [[ ! -z "$(git diff --name-only HEAD HEAD~1 | grep -e '^(app/|package.*)')" ]]; then
      npm run build:ref
      curl -XPOST https://notify.api/deploy/$(git rev-parse --short HEAD)
      // ... further steps ...
    fi
    

    You can easily extend this logic to your exact needs and take full control over the caching logic yourself - but you should only do this for steps involving false-positives or false-negatives by docker or kaniko, since all following steps will not be cached due to the undeterministic behavior.