Search code examples
gitrebase

Git log and graph show different results after git rebase with merge commits


Setup

I have a scenario, as shown by the git log graph below.

$ git log --oneline --graph
*   1c3ae5de (HEAD -> feature) Merge feature-2 into feature
|\
| * 9051a8b8 (feature-2) Commit D
|/
*   940c88e0 Merge feature-1 into feature
|\
| * 60dca27d (feature-1) Commit C
|/
* 8781f253 (main) Commit B
* 8704354e Initial commit

As you can hopefully see, there was a main branch which branched into the feature branch. On the feature branch, two separate branches feature-1 and feature-2 were branched off, work performed, and merged back into the feature branch with a merge commit.

The git log of this scenario is shown below as well.

$ git log --oneline
1c3ae5de (HEAD -> feature) Merge feature-2 into feature
9051a8b8 (feature-2) Commit D
940c88e0 Merge feature-1 into feature
60dca27d (feature-1) Commit C
8781f253 (main) Commit B
8704354e Initial commit

Issue

Now, someone commits a new commit on main and suddenly the log graph looks like the following1.

* 8a16a716 (HEAD -> main) Commit E
| *   1c3ae5de (feature) Merge feature-2 into feature
| |\
| | * 9051a8b8 (feature-2) Commit D
| |/
| * 940c88e0 Merge feature-1 into feature
|/|
| * 60dca27d (feature-1) Commit C
|/
* 8781f253 Commit B
* 8704354e Initial commit

What I would like to do is rebase the feature branch off of main, and preserve merge commits. This is achieved by executing git rebase --rebase-merges main while on the feature branch. However, after performing this action, the output of git log --oneline and git log --oneline --graph do not agree in terms of the order of the commits.

$ git log --oneline --graph
*   24e18500 (HEAD -> feature) Merge feature-2 into feature
|\
| * 33da98a4 Commit D
|/
*   a1ce7cd3 Merge feature-1 into feature
|\
| * bd23945b Commit C
|/
* 8a16a716 (main) Commit E
* 8781f253 Commit B
* 8704354e Initial commit
$ git log --oneline
24e18500 (HEAD -> feature) Merge feature-2 into feature
a1ce7cd3 Merge feature-1 into feature
33da98a4 Commit D
bd23945b Commit C
8a16a716 (main) Commit E
8781f253 Commit B
8704354e Initial commit

Question

Why does the output of git log --online show all the merge commits at the top of the list after a git rebase --rebase-merges? And why does that order disagree with the output in graph form?

1 As an ancillary question, I don't understand why the graph shows a / from the main line branch to the 940c88e0 commit. If anyone could answer that as well, I'd be much appreciative.


Solution

  • The root of the problem is that git log must sort commits in some cases.

    Remember that git log is displaying one commit at a time. Meanwhile, history, in a Git repository, consists of the commits themselves and their connections. For instance, in the initial graph output:

    *   1c3ae5de (HEAD -> feature) Merge feature-2 into feature
    |\
    | * 9051a8b8 (feature-2) Commit D
    |/
    *   940c88e0 Merge feature-1 into feature
    |\
    | * 60dca27d (feature-1) Commit C
    |/
    * 8781f253 (main) Commit B
    * 8704354e Initial commit
    

    commit 1c3ae5de has two parent commits 940c88e0 (first parent) and 9051a8b8 (second parent) respectively. Commit 9051a8b8 has just one parent, 940c88e0. This pattern repeats a bit and when we get to 8781f253 (at the tip of main), that commit has one parent 8704354e and commit 8704354e has no parents.

    When we have no merge commits at all—just a simple linear chain of commits—we have a nice simple picture:

    ... <-F <-G <-H   <--latest
    

    The branch name, here latest, helps Git find commit H quickly. Commit H provides, via the commit object, a snapshot (an archive of all files) and metadata, and the metadata for commit H list the author and date and log message and so on, but also list the raw hash ID of earlier commit G. This allows Git to use H to find G.

    Commit G is of course a commit, and as such has both snapshot and metadata. The metadata for G lists earlier commit F's raw hash ID, which allows Git to use G to find F. Commit F, being a commit, has snapshot and metadata, and onwards (or backwards) we (or Git) go(es). Eventually we reach the very first commit, which—being first—has an empty list of previous commits and therefore Git gets to stop going backwards.

    For git log to do its job—show (some) commits—we tell Git log which commit(s) we wish it to start with, via command line arguments:

    git log main
    

    or:

    git log feature
    

    for instance, or we let it default to starting with HEAD, or we use --all or --branches or something to select more than one starting point, or we can even give git log multiple raw hash IDs (though of course we probably have to run git log first to get them 😀).

    We can also use options (e.g., --no-merges or --max-parents=1, which mean the same thing) to prevent showing certain commits, and more options (e.g., --first-parent) to modify how git log follows commits backwards. For the moment, though, let's assume we're not doing any of this: we're having git log show every commit it visits, during this backwards walk, and not directing it to skip parts of the graph.

    As long as we're letting git log start with one commit (and not modifying the graph-walk), that's the commit it shows first. Having shown that commit, git log moves backwards to its parent (singular—there's trouble ahead but let's not get there yet) and shows that one. Having shown the parent, git log moves backwards one step yet again, and so on. That defines the order in which we see the commits: one at a time, backwards, along this simple linear chain.

    But: what if we tell git log to show two commits, say, 60dca27d and 9051a8b8, via git log feature-1 feature-2? Now git log has two commits to show initially. What order should it use to show them?

    While—or maybe before—we think too hard about that one, let's consider a branch-y history like yours only different:

         C--D
        /    \
    A--B      G--H   <-- main
        \    /
         E--F
    

    There's only one obvious place to start—from branch name main which selects commit H—so we'll have git log start there and work backwards, to commit G. But commit G is a merge commit, whose very definition is that it has two or more parents.1 Which one should git log visit first? And if we pick either D or F first, which one should git log visit second? Should it keep going down that "leg" of the merge, or should it switch to the other "leg", or what?


    1Git calls a merge commit with more than two parents an octopus merge, which is why the logo for GitHub is the octocat.


    Git's answer to the order dilemma

    There are a lot of ways to handle this, but Git's is ... complicated. There are sorting options, and filtering options as well, and all of them work based off a priority queue algorithm. The actual log-walk goes like this:

    1. Create an empty queue.
    2. Insert, into the queue, in priority order, all commits named on the command line.
    3. While the queue is not empty:
      1. Take the highest priority item off the queue and visit it (optionally showing it).
      2. Insert all or selected parents into the queue, provided they have not been visited before.

    So it's the priority in the queue that determines which commits we see when. As long as the queue itself has just one commit in it, we see that one commit, and then insert its parent(s) into the now-empty queue. As long as that one commit has one parent, we get the simple linear walk.

    The choice of priority is where git log's sorting options enter this picture. The first thing to note about it is that the default is a commit-date sort order, but --graph implies --topo-order, which the git log documentation tells us means:

    Show no parents before all of its children are shown, and avoid showing commits on multiple lines of history intermixed.

    So with --topo-order and the graph I drew, we'll see H, then G, then one of D or F, then the parent of whichever we just saw, then the other of D or F, then the parent of that commit, and only then can git log visit commit B, after which only commit A is in the queue.

    Without --topo-order, Git uses the committer date of each commit while working through the fork in the graph that occurs when we move backwards through the merge commit. The committer date is renewed during a git rebase, but the granularity of these time stamps is only one second, so an automated rebase often has a tie.2 With --topo-order, though, the priority will force one entire leg of the merge to be emitted before the other leg is started. There's no real promise about which leg comes first here, but the current --graph code itself depends on second-parent coming first, I think.


    2The tie will be broken by some unspecified means. It depends on how whoever wrote the priority queue chose to insert equal-priority items. Right now, this will at least be deterministic, but some future Git version that uses multiple CPUs in parallel might produce timing-dependent ordering.


    That answers your first question

    The --graph option forces --topo-order which forces a different commit order output. Any time you have a complex graph, I recommend using --graph (or a graphical viewer). Note that other software authors, e.g., gitk or other GUIs, may use their own (different) sorting methods for drawing the graph. As long as they draw a correct graph that you can interpret correctly, this isn't all that important.

    Your second question

    [After] executing git rebase --rebase-merges main while on the feature branch [the graph is]

    * 8a16a716 (HEAD -> main) Commit E
    | *   1c3ae5de (feature) Merge feature-2 into feature
    | |\
    | | * 9051a8b8 (feature-2) Commit D
    | |/
    | * 940c88e0 Merge feature-1 into feature
    |/|
    | * 60dca27d (feature-1) Commit C
    |/
    * 8781f253 Commit B
    * 8704354e Initial commit
    

    ... why [does] the graph show a / from the main line branch to the 940c88e0 commit

    It doesn't. It shows that going from commit 940c88e0 back to the main line. The second parent of 940c88e0 is 8781f253, which is your Commit B. It also shows that going from 9051a8b8 to its (first and only) parent. In particular though, there is no connection from commit 8a16a716 (on main) back to these commits: there's only a straight-line connection down to 8781f253.

    The positions and directions of the markers (|, \, and /) in git log --graph output are significant: at a merge commit we always see one line with * and then one line with |\ or /| underneath. (I think git log --graph used to use |\ always, and then reverse the \, but I might be mis-remembering.)

    The vertical line marks the connection to the first parent. The diagonal line, \ or /, marks the connection to the second parent (and for octopus merges, to additional parents as needed). The reason this matters is ... well, first, let's back up a bit and say that sometimes it does not matter.

    When you run git merge you do so by:

    • checking out some particular commit, so that it is now HEAD (and you're on that branch, assuming you use a branch name to get here); then
    • running git merge other or similar, where other specifies the other branch-tip commit.

    The git rebase --rebase-merges code is no different here. Rebase copies ordinary non-merge commits using git cherry-pick, but literally re-runs git merge to re-perform the merges.3 So these merges also first switch to some branch, then do the merge.

    We know how Git merges actually work:

              I--J   <-- br1 (HEAD)
             /
    ...--G--H
             \
              K--L   <-- br2
    

    Here, we're on branch br1 when we run git merge br2. So Git:

    • locates the merge base commit: the best commit on both branches is commit H in this case;

    • in effect, runs two git diffs, from merge base to each branch tip;

    • combines the diff results to get the set of changes to apply to the snapshot in the merge base;

    • applies the combined changes to that snapshot to get the merge snapshot; and

    • makes a new merge commit M:

                I--J
               /    \
      ...--G--H      M   <-- br1 (HEAD)
               \    /
                K--L   <-- br2
      

    As usual, making a new commit causes the current branch name to point to the new commit, so br1 now selects merge commit M. The snapshot for M holds the snapshot made by merging the two diffs and applying that to the snapshot from H. The two parents are commits J and L. It won't matter which order we used when combining those diffs (unless we added -X ours or -X theirs or used -s ours, that is!), so does it matter which parent is the first parent?

    Well, git log has --first-parent, which tells it that, during the walk through the graph, Git should only add the first parent to the priority queue. So if we'll use this option later (will we?), then suddenly it will matter which parent is the first parent.

    We're guaranteed that the first parent of any merge will be the commit we were "on", from the branch we were "on". That is, the parent of M will definitely be J, because we were (and still are) on branch br1 and its tip was J before the merge action.

    So | tells us which parent was the "main line of development" and \ or / tells us which parent was the other commit. That side branch will be omitted from the log output entirely if we use --first-parent.


    3This is important if you've used -s ours or -X ours or -X theirs, for instance. Git does not remember the options used and will re-perform the merge without using those options. Be careful with --rebase-merges! The documentation will no doubt be updated if and when this changes, but right now it says, in part:

    By default, the merge command will use the ort merge strategy for regular merges, and octopus for octopus merges. One can specify a default strategy for all merges using the --strategy argument when invoking rebase, or can override specific merges in the interactive list of commands by using an exec command to call git merge explicitly with a --strategy argument.

    The details are a bit messy and have already evolved from the time --rebase-merges was first introduced, so be sure to consult the documentation frequently.