I have a scenario, as shown by the git log graph below.
$ git log --oneline --graph
* 1c3ae5de (HEAD -> feature) Merge feature-2 into feature
|\
| * 9051a8b8 (feature-2) Commit D
|/
* 940c88e0 Merge feature-1 into feature
|\
| * 60dca27d (feature-1) Commit C
|/
* 8781f253 (main) Commit B
* 8704354e Initial commit
As you can hopefully see, there was a main
branch which branched into the feature
branch. On the feature
branch, two separate branches feature-1
and feature-2
were branched off, work performed, and merged back into the feature
branch with a merge commit.
The git log of this scenario is shown below as well.
$ git log --oneline
1c3ae5de (HEAD -> feature) Merge feature-2 into feature
9051a8b8 (feature-2) Commit D
940c88e0 Merge feature-1 into feature
60dca27d (feature-1) Commit C
8781f253 (main) Commit B
8704354e Initial commit
Now, someone commits a new commit on main
and suddenly the log graph looks like the following1.
* 8a16a716 (HEAD -> main) Commit E
| * 1c3ae5de (feature) Merge feature-2 into feature
| |\
| | * 9051a8b8 (feature-2) Commit D
| |/
| * 940c88e0 Merge feature-1 into feature
|/|
| * 60dca27d (feature-1) Commit C
|/
* 8781f253 Commit B
* 8704354e Initial commit
What I would like to do is rebase the feature
branch off of main
, and preserve merge commits. This is achieved by executing git rebase --rebase-merges main
while on the feature
branch. However, after performing this action, the output of git log --oneline
and git log --oneline --graph
do not agree in terms of the order of the commits.
$ git log --oneline --graph
* 24e18500 (HEAD -> feature) Merge feature-2 into feature
|\
| * 33da98a4 Commit D
|/
* a1ce7cd3 Merge feature-1 into feature
|\
| * bd23945b Commit C
|/
* 8a16a716 (main) Commit E
* 8781f253 Commit B
* 8704354e Initial commit
$ git log --oneline
24e18500 (HEAD -> feature) Merge feature-2 into feature
a1ce7cd3 Merge feature-1 into feature
33da98a4 Commit D
bd23945b Commit C
8a16a716 (main) Commit E
8781f253 Commit B
8704354e Initial commit
Why does the output of git log --online
show all the merge commits at the top of the list after a git rebase --rebase-merges
? And why does that order disagree with the output in graph form?
1 As an ancillary question, I don't understand why the graph shows a /
from the main line branch to the 940c88e0
commit. If anyone could answer that as well, I'd be much appreciative.
The root of the problem is that git log
must sort commits in some cases.
Remember that git log
is displaying one commit at a time. Meanwhile, history, in a Git repository, consists of the commits themselves and their connections. For instance, in the initial graph output:
* 1c3ae5de (HEAD -> feature) Merge feature-2 into feature
|\
| * 9051a8b8 (feature-2) Commit D
|/
* 940c88e0 Merge feature-1 into feature
|\
| * 60dca27d (feature-1) Commit C
|/
* 8781f253 (main) Commit B
* 8704354e Initial commit
commit 1c3ae5de
has two parent commits 940c88e0
(first parent) and 9051a8b8
(second parent) respectively. Commit 9051a8b8
has just one parent, 940c88e0
. This pattern repeats a bit and when we get to 8781f253
(at the tip of main
), that commit has one parent 8704354e
and commit 8704354e
has no parents.
When we have no merge commits at all—just a simple linear chain of commits—we have a nice simple picture:
... <-F <-G <-H <--latest
The branch name, here latest
, helps Git find commit H
quickly. Commit H
provides, via the commit object, a snapshot (an archive of all files) and metadata, and the metadata for commit H
list the author and date and log message and so on, but also list the raw hash ID of earlier commit G
. This allows Git to use H
to find G
.
Commit G
is of course a commit, and as such has both snapshot and metadata. The metadata for G
lists earlier commit F
's raw hash ID, which allows Git to use G
to find F
. Commit F
, being a commit, has snapshot and metadata, and onwards (or backwards) we (or Git) go(es). Eventually we reach the very first commit, which—being first—has an empty list of previous commits and therefore Git gets to stop going backwards.
For git log
to do its job—show (some) commits—we tell Git log which commit(s) we wish it to start with, via command line arguments:
git log main
or:
git log feature
for instance, or we let it default to starting with HEAD
, or we use --all
or --branches
or something to select more than one starting point, or we can even give git log
multiple raw hash IDs (though of course we probably have to run git log
first to get them 😀).
We can also use options (e.g., --no-merges
or --max-parents=1
, which mean the same thing) to prevent showing certain commits, and more options (e.g., --first-parent
) to modify how git log
follows commits backwards. For the moment, though, let's assume we're not doing any of this: we're having git log
show every commit it visits, during this backwards walk, and not directing it to skip parts of the graph.
As long as we're letting git log
start with one commit (and not modifying the graph-walk), that's the commit it shows first. Having shown that commit, git log
moves backwards to its parent (singular—there's trouble ahead but let's not get there yet) and shows that one. Having shown the parent, git log
moves backwards one step yet again, and so on. That defines the order in which we see the commits: one at a time, backwards, along this simple linear chain.
But: what if we tell git log
to show two commits, say, 60dca27d
and 9051a8b8
, via git log feature-1 feature-2
? Now git log
has two commits to show initially. What order should it use to show them?
While—or maybe before—we think too hard about that one, let's consider a branch-y history like yours only different:
C--D
/ \
A--B G--H <-- main
\ /
E--F
There's only one obvious place to start—from branch name main
which selects commit H
—so we'll have git log
start there and work backwards, to commit G
. But commit G
is a merge commit, whose very definition is that it has two or more parents.1 Which one should git log
visit first? And if we pick either D
or F
first, which one should git log
visit second? Should it keep going down that "leg" of the merge, or should it switch to the other "leg", or what?
1Git calls a merge commit with more than two parents an octopus merge, which is why the logo for GitHub is the octocat.
There are a lot of ways to handle this, but Git's is ... complicated. There are sorting options, and filtering options as well, and all of them work based off a priority queue algorithm. The actual log-walk goes like this:
So it's the priority in the queue that determines which commits we see when. As long as the queue itself has just one commit in it, we see that one commit, and then insert its parent(s) into the now-empty queue. As long as that one commit has one parent, we get the simple linear walk.
The choice of priority is where git log
's sorting options enter this picture. The first thing to note about it is that the default is a commit-date sort order, but --graph
implies --topo-order
, which the git log
documentation tells us means:
Show no parents before all of its children are shown, and avoid showing commits on multiple lines of history intermixed.
So with --topo-order
and the graph I drew, we'll see H
, then G
, then one of D
or F
, then the parent of whichever we just saw, then the other of D
or F
, then the parent of that commit, and only then can git log
visit commit B
, after which only commit A
is in the queue.
Without --topo-order
, Git uses the committer date of each commit while working through the fork in the graph that occurs when we move backwards through the merge commit. The committer date is renewed during a git rebase
, but the granularity of these time stamps is only one second, so an automated rebase often has a tie.2 With --topo-order
, though, the priority will force one entire leg of the merge to be emitted before the other leg is started. There's no real promise about which leg comes first here, but the current --graph
code itself depends on second-parent coming first, I think.
2The tie will be broken by some unspecified means. It depends on how whoever wrote the priority queue chose to insert equal-priority items. Right now, this will at least be deterministic, but some future Git version that uses multiple CPUs in parallel might produce timing-dependent ordering.
The --graph
option forces --topo-order
which forces a different commit order output. Any time you have a complex graph, I recommend using --graph
(or a graphical viewer). Note that other software authors, e.g., gitk
or other GUIs, may use their own (different) sorting methods for drawing the graph. As long as they draw a correct graph that you can interpret correctly, this isn't all that important.
[After] executing
git rebase --rebase-merges main
while on the feature branch [the graph is]
* 8a16a716 (HEAD -> main) Commit E
| * 1c3ae5de (feature) Merge feature-2 into feature
| |\
| | * 9051a8b8 (feature-2) Commit D
| |/
| * 940c88e0 Merge feature-1 into feature
|/|
| * 60dca27d (feature-1) Commit C
|/
* 8781f253 Commit B
* 8704354e Initial commit
... why [does] the graph show a
/
from the main line branch to the940c88e0
commit
It doesn't. It shows that going from commit 940c88e0
back to the main
line. The second parent of 940c88e0
is 8781f253
, which is your Commit B
. It also shows that going from 9051a8b8
to its (first and only) parent. In particular though, there is no connection from commit 8a16a716
(on main
) back to these commits: there's only a straight-line connection down to 8781f253
.
The positions and directions of the markers (|
, \
, and /
) in git log --graph
output are significant: at a merge commit we always see one line with *
and then one line with |\
or /|
underneath. (I think git log --graph
used to use |\
always, and then reverse the \
, but I might be mis-remembering.)
The vertical line marks the connection to the first parent. The diagonal line, \
or /
, marks the connection to the second parent (and for octopus merges, to additional parents as needed). The reason this matters is ... well, first, let's back up a bit and say that sometimes it does not matter.
When you run git merge
you do so by:
HEAD
(and you're on that branch, assuming you use a branch name to get here); thengit merge other
or similar, where other
specifies the other branch-tip commit.The git rebase --rebase-merges
code is no different here. Rebase copies ordinary non-merge commits using git cherry-pick
, but literally re-runs git merge
to re-perform the merges.3 So these merges also first switch to some branch, then do the merge.
We know how Git merges actually work:
I--J <-- br1 (HEAD)
/
...--G--H
\
K--L <-- br2
Here, we're on branch br1
when we run git merge br2
. So Git:
locates the merge base commit: the best commit on both branches is commit H
in this case;
in effect, runs two git diff
s, from merge base to each branch tip;
combines the diff results to get the set of changes to apply to the snapshot in the merge base;
applies the combined changes to that snapshot to get the merge snapshot; and
makes a new merge commit M
:
I--J
/ \
...--G--H M <-- br1 (HEAD)
\ /
K--L <-- br2
As usual, making a new commit causes the current branch name to point to the new commit, so br1
now selects merge commit M
. The snapshot for M
holds the snapshot made by merging the two diffs and applying that to the snapshot from H
. The two parents are commits J
and L
. It won't matter which order we used when combining those diffs (unless we added -X ours
or -X theirs
or used -s ours
, that is!), so does it matter which parent is the first parent?
Well, git log
has --first-parent
, which tells it that, during the walk through the graph, Git should only add the first parent to the priority queue. So if we'll use this option later (will we?), then suddenly it will matter which parent is the first parent.
We're guaranteed that the first parent of any merge will be the commit we were "on", from the branch we were "on". That is, the parent of M
will definitely be J
, because we were (and still are) on branch br1
and its tip was J
before the merge action.
So |
tells us which parent was the "main line of development" and \
or /
tells us which parent was the other commit. That side branch will be omitted from the log output entirely if we use --first-parent
.
3This is important if you've used -s ours
or -X ours
or -X theirs
, for instance. Git does not remember the options used and will re-perform the merge without using those options. Be careful with --rebase-merges
! The documentation will no doubt be updated if and when this changes, but right now it says, in part:
By default, the merge command will use the
ort
merge strategy for regular merges, andoctopus
for octopus merges. One can specify a default strategy for all merges using the--strategy
argument when invoking rebase, or can override specific merges in the interactive list of commands by using anexec
command to callgit merge
explicitly with a--strategy
argument.
The details are a bit messy and have already evolved from the time --rebase-merges
was first introduced, so be sure to consult the documentation frequently.