Search code examples
gitgit-mergegit-rebasegit-remote

Git: Why does previous fetch+merge commit disappear after fetch+rebase


TL;DR: If I fetch remote changes into a local git repo, then do a merge, and some time later I fetch some new changes, but this time I do rebase instead of merge, then the previously created merge commit disappears. Why?

Example

Consider the following starting point, created by the command git log --all --graph --decorate --oneline:

* 28992d3 (repo1/master) hello4
* 3610bdf hello3
| * f113d63 (HEAD -> master) bye-bye
| * cabc896 bye
|/  
* 75f7ca9 hello2
* 525cb4a hello1

I.e., there is a git repo, with a master branch with some local, unpushed changes. Some other changes have been just fetched from the remote (in this case repo1).

Next command: git merge repo1/master. Result:

*   b94aa29 (HEAD -> master) Merge remote-tracking branch 'repo1/master'
|\  
| * 28992d3 (repo1/master) hello4
| * 3610bdf hello3
* | f113d63 bye-bye
* | cabc896 bye
|/  
* 75f7ca9 hello2
* 525cb4a hello1

Now let's say there are some new commits both locally, as well as in the remote repo1, and then, again the remote contents are fetched from repo1, via git fetch repo1 master. The result looks like this:

* 2e3d749 (repo1/master) hello6
* b17983d hello5
| * 2e49819 (HEAD -> master) see ya
| * c2f2d5a good-bye
| *   b94aa29 Merge remote-tracking branch 'repo1/master'
| |\  
| |/  
|/|   
* | 28992d3 hello4
* | 3610bdf hello3
| * f113d63 bye-bye
| * cabc896 bye
|/  
* 75f7ca9 hello2
* 525cb4a hello1

So far so good. Now let's do git rebase repo1/master, and the result is a nice, linear commit log:

* 101e524 (HEAD -> master) see ya
* 3ce7543 good-bye
* 849cbd4 bye-bye
* 483bab8 bye
* 2e3d749 (repo1/master) hello6
* b17983d hello5
* 28992d3 hello4
* 3610bdf hello3
* 75f7ca9 hello2
* 525cb4a hello1

Question: where did the commit b94aa29 Merge remote-tracking branch 'repo1/master' go? (As far as I see it was not preserved even as a "dead" commit, like e.g. doing commits in detached head.)

Remarks:

  • I guess that the answer must lie along the lines of "git notices we don't need b94aa29 anymore, because we will have all its contents anyway", but can you please explain more in detail what is going on? And also, is this always true, that rebasing on a previously merged branch will throw away all merge commits?
  • It would be nice to know, if you can somehow force the merge commit to remain.
  • If the example can be simplified, I'm willing to edit the question.

Solution

  • TL;DR version of answer

    git rebase functionally means:

    • pick out some set of commits to copy;
    • copy those commits, one at a time, as if by git cherry-pick;
    • when done, change the current branch name, whatever that is, to point to the final copied commit.

    The copying literally can't copy merges, so it usually doesn't bother trying.

    Longer (but see other answers of mine for much more)

    The general idea here is to take a series of commits:

                 A--B--C--D   <-- topic (HEAD)
                /
    ...--o--o--*--o--o   <-- mainline
    

    and transplant them to a series of new-and-improved commits:

                 A--B--C--D   [abandoned]
                /
    ...--o--o--*--o--o   <-- mainline
                      \
                       A'-B'-C'-D'  <-- topic (HEAD)
    

    The "improvement" is to base the new chain on the tip of some other branch, such as mainline. To make this happen, Git literally must copy the original commits—A-B-C-D, here—to different commits that have different hash IDs, because every commit, once made, is permanent1 and set in stone; a commit that is even one single bit different gives you a new, different commit hash ID, even if the only difference is the parent ID stored in the new commit. So even if the source tree in snapshot A' matches the source tree in snapshot A—and it probably doesn't—the commit ID for A' is different from the commit ID for A.

    (This carries on through the rest of the commits as well, of course.)

    The arguments you give to git rebase select:

    • which commits to copy, and
    • where to start the copies (where to put the first-copied commit).

    Normally you can get away with a single name for both of these. For instance, git rebase mainline means to put the copies after the commit to which mainline points, and to copy those commits that are reachable from the commit to which topic (the current branch name) points—i.e., D—excluding any commits reachable from the tip of mainline. The first commit that's not copied is commit *, where the two branches rejoin (in this case forever).

    In some cases, you may need to use git rebase --onto to separate the two notions. With --onto, you tell rebase where to put the copies, freeing up the remaining argument to mean what not to copy. That's not required here.

    There are a bunch of kinds/flavors of rebase: git rebase with no arguments uses git format-patch | git am to copy commits, rather than actually running git cherry-pick, while git rebase -i actually uses git cherry-pick. (In older versions of Git, git rebase -i is a shell script that literally runs git cherry-pick. To make it faster for Windows, git rebase was modified so that -i is built in to Git's sequencer, which is code that implements both cherry-pick and revert.)

    Note that all this copying, which goes one at a time, ends up building a linear chain of commits. This happens even if the inputs might include a merge, as in:

              A--B--M--C--D   <-- master
             /     /
    ...--o--*--o--S------o--T   <-- repo1/master
    

    You now ask Git to rebase (i.e., copy) some commits—in this case, some commits that are on master—with the --onto target being T, and the limit being *the first commit reachable from T / origin/master that is also on master, which is commit *.

    The complete list of such commits is A then B then M then C then D. But how should Git copy M? If it tried, the result might look a lot like:

              A--B--M--C--D   [abandoned]
             /     /
    ...--o--*--o--S------o--T   <-- repo1/master
                             \
                              A'-B'-M'-C'-D   <-- master (HEAD)
                                   /
                      ???----------
    

    except M', to be a merge, needs to have two parents. What other parent should it have? If its other parent is S, well, that's possible, but what value does it bring?

    (The point of a merge is to combine changes in two different lines of development. Since A' is based on T which is based on S, A' already includes whatever was in S and there is no need to merge it.)

    In general, Git simply omits the merge commits entirely here, so it ends up copying just A-B-C-D. Note that if you rebase something containing an internal merge, the same thing happens: Git simply copies both "sides" of the merge, linearizing the result:

                     C--D
                    /    \
                A--B      M--G   <-- topic (HEAD)
               /    \    /
              /      E--F
             /
    ...--o--*--o--o   <-- mainline
    

    Here git rebase will copy A-B-C-D-E-F-G or perhaps A-B-E-F-C-D-G, removing M and flattening the topology.

    There is a -p flag to git rebase -i, which has a longer spelling --preserve-merges, but it doesn't actually preserve the merges (nor cherry-pick them, which is impossible). Instead, it makes new merges (by running git merge). This is quite tricky, but can be used to rebase the above A-B-(C-D, E-F)-M-G topology. Note that if you resolved merge conflicts in M, you will have to resolve them again when Git makes a new merge M' that merges D' and F' (git rerere may be useful here).


    1Permanent, that is, until the entire commit has been abandoned long enough for Git to be sure that no one wants it; then it gets cleaned away by git gc.