Search code examples
dockergradlegitlabgitlab-cigitlab-pipelines

How to make a GitLab Pipeline not start each Stage from scratch?


I want to use a GitLab CI/CD pipeline to build an app and run some tests. I've setup a .gitlab-ci.yml for the project, like this:

default:
  image: docker-image-name

build-app:
  stage: build
  script:
    - compile code... # Compile and build

run-unit-tests:
  stage: test
  script:
    - run tests... # Run the unit tests after the build stage has finished

GitLab Pipeline screenshot - Visualize the YML stages and Jobs

I'm using a Docker image to build the app. The yml script works, but it's a bit inefficient. When the "build" stage runs, it starts from an empty folder, then downloads the code with a Git clone, then downloads the libraries and dependencies and then builds the app. That's all fine.

The problem happens with the "test" stage. This runs after the "build" stage, but it restarts everything from scratch: It starts with an empty folder, then does a Git clone, then downloads the dependencies and then runs the tests (which requires another rebuild). This is slow and inefficient, and it duplicates work.

What I want is for the pipeline to just continue from the previous stage, and re-use the output from it, without starting from an empty folder. So the "test" stage should use all the output from the "build" stage, and just continue from the last step in the "build" stage.

Is this possible? How do I do that?

Optional: Why is the CI/CD setup in this way to restart from scratch? Bitbucket Pipelines is similar, where each step starts from a blank folder. I know that it makes each stage self-contained and independent, but would anyone need that ability?

My question is very similar to this one: How to set up GitLab CI to run multiple build steps efficiently while indicating at which step it is?


Note 1: The GitLab documentation doesn't mention anything about each stage starting from scratch.

Note 2: The same problem happens when using multiple Jobs in one stage, like this:

build-app-variant1:
  stage: build
  script:
    - build the first variant of the app # This runs first

build-app-variant2:
  stage: build
  script:
    - build the second variant of the app # This runs in independently in parallel, starting from a blank folder

Each Job will start from scratch and take more time than necessary, instead of continuing from the previous part of the script.


Solution

  • I'm quoting the comments from the question to summarize as an answer:

    As mentioned by Michael Delgado in his comments , this is the normal expected behavior. All the major CI/CD providers work this way, when using Docker containers to build software in a pipeline, so it's not considered a problem. If you start a new build Stage, you get a new container. That’s how docker works. Docker images must start from the last saved layer when booted, so the work from a previous Stage is lost.

    Some example ways to workaround it:

    • The simplest way is just not to use multiple Stages. For example, run a second command after your build inside the first pipeline, or put your entire build script in one Stage. But then you lose the visual partitioning and organization of the multiple build stages.
    • Use build artifacts by specifying the path to the built app binary files. These will be uploaded from the previous Stage, then downloaded automatically in the next Stage.
    • Use a container registry
    • Push files and folders to your own independent online storage provider, and then download them manually
    • Build a docker image, cache it in the project artifact registry, and then reuse it in a later stage.
    • Check if the "shared workspaces" feature request has been implemented: https://gitlab.com/gitlab-org/gitlab/-/issues/29265 (this is mentioned in the similar StackOverflow question)