Search code examples
cmakefortrangitlab-ci

cmake does not (always) order Fortran modules correctly


I have a code using Fortran modules. I can build it with no problems under normal circumstances. CMake takes care of the ordering of the module files.

However, using a gitlab runner, it SOMETIMES happens that cmake does NOT order the Fortran modules by dependencies, but alphabetically instead, which than leads to a build failure.

The problem seems to occur at random. I have a branch that built in the CI. After adding a commit, that modified a utility script not involved in any way in the build, I ran into this problem. There is no difference in the output of the cmake configure step.

I use the matrix configuration for the CI to test different configurations. I found, that I could trigger this by adding another mpi version (e.g. openmpi/4.1.6). Without that version, it built. With it added in the matrix, ALL configurations showed the problem.

stages:
    - configure
    - build
    - test


.basic_config:
    tags:
        - hpc_runner

    variables:
            # load submodules
        GIT_SUBMODULE_STRATEGY: recursive

.config_matrix:
    extends: .basic_config
    # define job matrix
    parallel:
        matrix:
            - COMPILER: [gcc/9.4.0]
              PARALLELIZATION: [serial, openmpi/3.1.6]
              TYPE: [option1, option2]
              BUILD_TYPE: [debug, release]
            - COMPILER: [gcc/10.3.0, intel/19.0.5]
              PARALLELIZATION: [serial]
              TYPE: [option2]
              BUILD_TYPE: [debug]

###############################################################################
# setup script

# These commands will run before each job.
before_script:
  - set -e
  - uname -a
  - |
    if [[ "$(uname)" = "Linux" ]]; then
      export THREADS=$(nproc --all)
    elif [[ "$(uname)" = "Darwin" ]]; then
      export THREADS=$(sysctl -n hw.ncpu)
    else
      echo "Unknown platform. Setting THREADS to 1."
      export THREADS=1
    fi

  # load environment
  - source scripts/build/load_environment $COMPILER $BUILD_TYPE $TYPE $PARALLELIZATION
  # set path for build folder
  - build_path=build/$COMPILER/$PARALLELIZATION/$TYPE/$BUILD_TYPE

configure:
    stage: configure
    extends: .config_matrix
    script:
        - mkdir -p $build_path
        - cd $build_path
        - $CMAKE_COMMAND
    artifacts:
        paths:
            - build
        expire_in: 1 days  


###############################################################################
# build script

build: 
    stage: build
    extends: .config_matrix
    script:
        - cd $build_path
        - make
    artifacts:
        paths:
            - build
        expire_in: 1 days
    needs:
        - configure

###############################################################################
# test

test: 
    stage: test
    extends: .config_matrix
    script:
        - cd $build_path
        - ctest --output-on-failure
    needs:
        - build

The runner runs on an HPC machine which a complex setup, and I am not to familiar with the exact configuration. I contacted the admin with this problem, but wanted to see if anybody else had run into this before and have solutions or hints on what is going on.


Solution

  • With the help from our admin I figured it out.

    The problem comes from cmake using absolute paths. The runner has actually several runners for parallel jobs, with each using a different prefix path, e.g. /runner/001/ or /runner/012/. So when I run configure on a specific runner, cmake saves that prefix path to the configuration.

    Now in the build stage, there is no guarantee to have the same configuration run on the same runner. However, since there are absolute paths in the make files, make tries to access the folders in the configure runner's prefix. Now, that can be anything from non-existing, over old files from previous pipelines to the correct files downloaded by another case.

    The only fix I currently can see is to run everything on the same runner in one stage, to avoid the roulette of prefix paths. If anybody has a different idea, or if there is a way to fix a specific matrix case to a specific runner prefix, please comment.