Search code examples
gitjenkinsjenkins-pipelinemultibranch-pipeline

Jenkins + Git: Only build if PR introduced changes in subdirectory


We have a large monorepo with multiple projects (A and B) inside of it. I currently have Jenkins setup as a Multibranch Pipelines project that watches the monorepo for PRs. If a PR is created, Jenkins builds both A and B.

Now, I want Jenkins to be smarter and only build project A if any change in the PR introduced a change in the A/ directory. This is proving very difficult.

when { changeset "A/" } only appears to check if the last commit changed a file in A/, not if the PR changed a file in A/.

So I made it smarter using https://issues.jenkins-ci.org/browse/JENKINS-54285 and did:

when { expression { return sourceChanged("A/") } }

with sourceChanged defined as:

def boolean sourceChanged(String module) {
    if (env.CHANGE_TARGET == null)
        return true;

    def MASTER = sh(returnStdout: true, script: "git rev-parse origin/${env.CHANGE_TARGET}").trim()
    def HEAD = sh(returnStdout: true, script: "git show -s --no-abbrev-commit --pretty=format:%P%n%H%n HEAD | tr ' ' '\n' | grep -v ${MASTER} | head -n 1").trim()

    return sh(returnStatus: true, script: "git diff --exit-code --name-only ${MASTER}...${HEAD} {module}") == 1;
}

However, no matter what I try, I can't get a commit hash for the CHANGE_TARGET. I always get something along the following error:

git rev-parse origin/master
fatal: ambiguous argument 'origin/master': unknown revision or path not in the working tree.

Why isn't Git able to find master, origin/master, refs/head/master, etc (I tried them all)? Is there an easier way to accomplish what I am trying to do?


I'm using jenkins/jenkins:lts from docker hub as well as the BitBucket Branch Source plugin.

Here is the relevant Jenkins log sequence, if it helps:

Fetching changes from 2 remote Git repositories
 > git config remote.origin.url http://bitbucket.ccm.com:7990/scm/JUP/jt.git # timeout=10
Fetching without tags
Fetching upstream changes from http://bitbucket.ccm.com:7990/scm/JUP/jt.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials 
 > git fetch --no-tags --progress -- http://bitbucket.ccm.com:7990/scm/JUP/jt.git +refs/pull-requests/9/from:refs/remotes/origin/PR-9
 > git config remote.upstream.url http://bitbucket.ccm.com:7990/scm/JUP/jt.git # timeout=10
Fetching without tags
Fetching upstream changes from http://bitbucket.ccm.com:7990/scm/JUP/jt.git
using GIT_ASKPASS to set credentials 
 > git fetch --no-tags --progress -- http://bitbucket.ccm.com:7990/scm/JUP/jt.git +refs/heads/master:refs/remotes/upstream/master
Merging remotes/upstream/master commit 7ef64efeb0fb19d8931a684f147666ae681b4ddf into PR head commit 47600816c0dca3e5555e417085ab2052453a39b2
Enabling Git LFS pull
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 47600816c0dca3e5555e417085ab2052453a39b2
 > git config --get remote.origin.url # timeout=10
using GIT_ASKPASS to set credentials 
 > git lfs pull origin
 > git merge 7ef64efeb0fb19d8931a684f147666ae681b4ddf # timeout=10
 > git rev-parse HEAD^{commit} # timeout=10
Merge succeeded, producing 47600816c0dca3e5555e417085ab2052453a39b2
Checking out Revision 47600816c0dca3e5555e417085ab2052453a39b2 (PR-9)
Enabling Git LFS pull
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 47600816c0dca3e5555e417085ab2052453a39b2
 > git config --get remote.origin.url # timeout=10
using GIT_ASKPASS to set credentials 
 > git lfs pull origin
Commit message: "l"
[Pipeline] withEnv
[Pipeline] {
[Pipeline] sh
+ docker inspect -f . registry.ccm.com:7991/jt:1.0
.
[Pipeline] withDockerContainer
Jenkins seems to be running inside container fdc7e8eec5ea708e59cebe4682651bc5192478b95de803b5981edd222f39af97
$ docker run -t -d -u 1000:979 -v $PWD:/build_env -v $HOME/.ssh:/home/docker_user/.ssh -w /build_env --add-host civm3:10.33.67.183 -e UNIX_USER=jtbuild -w /var/jenkins_home/workspace/jt_PR-9@2 --volumes-from fdc7e8eec5ea708e59cebe4682651bc5192478b95de803b5981edd222f39af97 -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** registry.ccm.com:7991/jt:1.0 cat
$ docker top c7bb23bbc91119c2b1875ab2a9186ae34da1754f2b8ae42f758594227ff77137 -eo pid,comm
[Pipeline] {
[Pipeline] sh
+ git rev-parse origin/master
fatal: ambiguous argument 'origin/master': unknown revision or path not in the working tree.

All I want is access to the two relevant commit ids in the Jenkinsfile: 7ef64efeb0fb19d8931a684f147666ae681b4ddf and 47600816c0dca3e5555e417085ab2052453a39b2!


Solution

  • Ok, I finally solved it.

    It appears (please correct me if I have the terminology incorrect) that Jenkins does what's called a bare clone which means you won't have access to any refs unless you specifically fetch them. Thus, you will not have access to your branch names, local or remote.

    The key is in these 2 lines of the log:

    > git fetch --no-tags --progress -- http://bitbucket.ccm.com:7990/scm/JUP/jt.git +refs/pull-requests/9/from:refs/remotes/origin/PR-9 
    > git fetch --no-tags --progress -- http://bitbucket.ccm.com:7990/scm/JUP/jt.git +refs/heads/master:refs/remotes/upstream/master
    

    Here is a shortened, annotated version of those above two commands:

    > git fetch the PR ref, store it as 'origin/PR-9'
    > git fetch master ref, store it as 'upstream/master'
    

    Thus, the two commits of interest are stored in origin/PR-9 and upstream/master.

    Conveniently, the Jenkins environment variables BRANCH_NAME and CHANGE_TARGET contain PR-9 and master respectively.

    Thus, the Jenkinsfile should use the following:

    def boolean sourceChanged(String module) {
        def target_branch = env.CHANGE_TARGET;
        def pr_ref        = env.BRANCH_NAME;
    
        if (target_branch == null) {
            echo "No target branch defined...";
            return true;
        }
    
        def TARGET = sh(returnStdout: true, script: "git rev-parse upstream/${target_branch}").trim()
        def HEAD   = sh(returnStdout: true, script: "git rev-parse origin/${pr_ref}").trim()
    
        echo "Checking for source changes between ${TARGET} (${target_branch}) and ${HEAD} (${pr_ref})...";
        return sh(returnStatus: true, script: "git diff --exit-code --name-only ${TARGET}...${HEAD} {module}") == 1;
    }
    

    in conjunction with, i.e.:

    when { expression { return sourceChanged("A/") } }
    

    Checking for diffs in multiple directories would be accomplished as such:

    def SOURCE_DIRS = [
        "A/",
        "X/"
    ];
    ...
    when { expression { return sourceChanged(SOURCE_DIRS) } }
    ...
    def sourceChanged(ArrayList<String> source_dirs) {
        def target_branch = env.CHANGE_TARGET;
        def pr_ref        = env.BRANCH_NAME;
    
        if (target_branch == null) {
            echo "No target branch defined...";
            return true;
        }
    
        def TARGET = sh(returnStdout: true, script: "git rev-parse upstream/${target_branch}").trim()
        def HEAD   = sh(returnStdout: true, script: "git rev-parse origin/${pr_ref}").trim()
    
        echo "Checking for source changes between ${TARGET} (${target_branch}) and ${HEAD} (${pr_ref})...";
        for (String dir : source_dirs) {
            def rc = sh(returnStatus: true, script: "git diff --name-only --exit-code ${TARGET}...${HEAD} ${dir}");
            if (rc == 1) {
                echo "Changes detected in ${dir}!";
                return true;
            }
        }
    
        echo "No changes detected.";
        return false;
    }