git-log: output order of ancestor branches for a merge commit with topological ordering

There is a piece of code in my project that relies on the output of this command:

git log -n 2 --topo-order --pretty=format:"%H"

This is applied to merge commits (merged pull requests) on master branch and assumes that the result is the hash of the current (merge) commit and the parent commit from the feature branch. So eg.

master   a - b - m
          \     /
feature    x - y

when executed on merge commit m it is assumed the result will be m y and not m b. I verified by checking multiple such merge commits that indeed it's the case - always the parent from the feature branch is returned. However, in the documentation of git-log --topo-order I see that no guarantees are made as to which parent branch is printed first.

Can anyone explain how the choice is made which parent branch to print first when git-log --topo-order is used and why does it always show the feature branch first in my use case?

Solution

The internal git log algorithm is to insert "new" (unvisited) commits into a priority queue. The overall loop is:

while queue is not empty:
    commit = queue.remove_front()
    ... deal with commit ...

where the deal with code may insert the commit's parent(s) into the queue.

The --topo-order switch simply (or complicated-ly, as the case actually is) modifies the priority of the commits as Git marches down the two legs of the merge, so that you get everything from one leg, then everything from the other.

As joanis notes in a comment, the documentation explicitly makes the parent ordering unspecified. This allows the Git implementation to switch to a new algorithm in the future, that might pick a different "starting leg". So it's unwise to depend on what you're getting right now.

(I think what you're getting right now is that the parent commits are sorted by committer-date order, so that if parent #1 has an earlier committer timestamp than parent #2, git log --topo-order will trace leg-two first. But this code is extremely messy and there are numerous dark corners, so I'm not willing to say that this is definitely the case.)

If you know that you have a merge commit hash ID in some variable $H, or found by some name $name, a completely reliable way to get this hash ID and the parent hash ID, in that order, is to use git rev-parse to do it:

git rev-parse ${H} ${H}^2

or:

git rev-parse ${name} ${name}^2

Note that if you have an arbitrary expression $expr that git rev-parse can turn into a hash ID, it's wise to do that once before adding the ^2 suffix to get the second parent. That's because some expressions will "consume" the suffix. For instance, the gitrevisions syntax :/fix nasty bug is a valid way to search for that text in a commit message. Adding ^2 produces :/fix nasty bug^2, which searches for bug^2, rather than finding the hash first, then moving to its second parent. So:

H=$(git rev-parse ${expr}) || exit
H2=$(git rev-parse ${H}^2)

is a reliable way to write, in shell script, the commands to turn the expression into the desired two hash IDs.