Search code examples
gitgit-history-graph

How to find whether a given commit is on the first-parent chain of a branch


I'm trying to script a way to tell if a given commit is on the first-parent chain of a given branch. So, for example, merge-base won't fly, as the commit might have been merged in. I want to know whether the exact commit was ever the tip of the branch.

Note: The branch in question is subject to a no-fast-forward merge strategy.


Solution

  • The fancy way

    A simple "is ancestor" test obviously won't do, as commits down 2nd-or-later parent chains are also ancestors:

    ...o--o--A--o--o--o--T
        \            /
     ...-o--*--B----o
             \
              C
    

    Both A and B are ancestors of T, but you want to accept A while rejecting B and C. (Assume --first-parent is the top line here.)

    Using git merge-base, however, will actually do part of the job. You don't need the --is-ancestor mode of git merge-base, though, and do need some additional processing.

    Note that, regardless of the path between T and some ancestor, the merge base of T and that ancestor (such as A or B) is either the ancestor itself (A or B respectively here), or some ancestor of the ancestor, such as commit * if we look at T and C as a pair. (This holds even in the case of multiple merge bases, although I leave constructing a proof of that to you.)

    If the, or any arbitrarily chosen one of the set of all, merge base(s) of the test commit and the branch-tip is not already the test-commit, we have a case like C and can reject it out of hand. (Or, we can use --is-ancestor to reject it, or ... well, see below.) If not, we must enumerate the commits in the ancestry path between the commit in question and the branch tip. For A this is:

             o--o--*--T
    

    and for B this is:

                   *--T
                  /
                 o
    

    If any such commit is a merge commit, as is the one marked *, we need to make sure that the first parent includes one of the commit(s) listed along this path. The toughest cases are those topologically similar to:

           o--o
          /    \
    ...--A      o--T
          \    /
           o--o
    

    since the --ancestry-path between these includes a merge and two ways to reach A, one of which is a first-parent path and one of which is not. (This is true if T itself is a merge as well.)

    We don't actually need to find the merge base in the first place, though. We're only using the merge base in order to examine the ancestry path. If the merge base is not the test commit itself, then the test commit is not an ancestor of the tip commit, and testcommit..tipcommit will not include testcommit itself. Moreover, adding --ancestry-path—which discards all commits that are not themselves children of the left hand side here—will then discard all commits in the git rev-list output: a case like C has no descendants that are ancestors of T (if it did, C would be a merge base).

    Hence, what we want is to examine the commits in git rev-list --ancestry-path testcommit..branchtip. If this list is empty, the test commit is not an ancestor of the branch tip in the first place. We have a case like commit C; so we have our answer. If the list is non-empty, reduce it to its merge components (run again with --merges, or feed the list to git rev-list --stdin --merges, to produce the shrunken list). If this list is non-empty, check each merge by finding its --first-parent ID and making sure the result is in the first list.

    In actual (albeit untested) shell-script code:

    TF=$(mktemp) || exit 1
    trap "rm -f $TF" 0 1 2 3 15
    git rev-list --ancestry-path $testcommit..$branch > $TF
    test -s $TF || exit 1  # not ancestor
    git rev-list --stdin --merges < $TF | while read hash; do
        parent1=$(git rev-parse ${hash}^1)
        grep "$parent1" $TF >/dev/null || exit 1 # on wrong path
    done
    exit 0 # on correct path
    

    The brute-force way

    The above tests as few commits as possible, but it would be a lot more practical, in a sense, to just run:

    git rev-list --first-parent ${testcommit}^@..$branch
    

    If the output includes $testcommit itself, then $testcommit is reachable, by first-parent only, from branch. (We use ^@ to exclude all parents of $testcommit so that this works even for the root commit; for other commits, ${testcommit}^ suffices since we're using --first-parent.) Moreover, if we make sure this is done in topological order, the last commit ID emitted from the git rev-list command will be $testcommit itself if and only if $testcommit is reachable from $branch. Hence:

    hash=$(git rev-parse "$testcommit") || exit 1
    t=$(git rev-list --first-parent --topo-order $branch --not ${hash}^@ | tail -1)
    test $hash = "$t"
    

    should do the trick. The quotes around $t are in case it expands to the empty string.