Isn't git merge --squash really git rebase -squash?

Trying to understand why the command

git merge --squash mybranch

isn't called

git rebase -squash mybranch

Since it seems the operation it does is much more like a rebase than a merge. My understanding is it searches back in the commit tree until it finds a common base commit for the current branch and mybranch. Then it reapplies (rebases) all the commits from that base node up to the head of mybranch onto the head of the current branch. But does this into the index/workspace so it can be applied as a single commit. When done there is no merge node as there is in a normal merge showing the two branches that were merged. Do I have a correct understanding?

Solution

Well, merging and rebasing are fundamentally different operations. Merge—by which I mean a regular git merge that creates a new merge commit—does indeed search back through the commit graph for the most recent common commit:

...--o--*--o--o--A   <-- mainbr
         \
          B--C--D--E   <-- sidebr

Here, the most recent common commit is *. The merge process will then compare (as in git diff) commit * with commit A to find out "what we did", and diff * against commit E to find out "what they did". It then makes a new single merge commit, with two parents:

...--o--*--o--o--A---M   <-- mainbr
         \          /
          B--C--D--E   <-- sidebr

which joins the two histories, and combines "what we did" and "what they did", so that diffing * vs M gives "one of each change".

Note that you get no choice of merge base here: Git figures it out, and that's that.

Rebase, on the other hand, can be told both which commits to copy, and where to copy them, separately. It's true that, by default, it locates commit * again¹—but then it copies the original commits, one by one, using git cherry-pick or the equivalent; and finally, it moves the branch label to point at the last copied commit.

...--o--*--o--o--A   <-- mainbr
         \        \
          \        B'-C'-D'-E'   <-- sidebr
           \
            B--C--D--E

The original chain of commits (B-C-D-E, in this case) is still in the repository, and still findable: they can be found by hash ID, and in the reflog for sidebr, and if any other branch or tag name makes them reachable, they remain reachable by that name.

What git merge --squash does is to modify the merge process just slightly: instead of making merge commit M, Git goes through the merge machinery as usual, diffing the merge base—which you don't get to choose—against the current commit and the other commit, and combining the changes in the index and work-tree. It then—for no obvious reason²—stops and makes you run git commit to commit the result, and when you do, it's an ordinary, non-merge commit, so that the whole graph-fragment looks like this:

...--o--*--o--o--A--F   <-- mainbr
         \
          B--C--D--E   <-- sidebr

Now, the contents of commit F—the snapshot tree resulting from the merge—is the same as the contents of commit M when we do a real merge, and—here's the real kicker—it's also the same as the contents of commit E' when we do a rebase.

Moreover, suppose there were only one commit (B) on the side branch sidebr. Now all three of merge, merge --squash, and rebase will give you a picture that ends with something we might just draw like this:

...--o--*--o--o--A--B'   <-- ???
         \          ?
          B?????????   <-- ???

and the contents of new commit B', i.e., the final snapshot, is the same in all three cases. However, for git merge, the new commit will be on branch mainbr and will point back to commits A and B, with sidebr pointing to B; for git merge --squash, the new commit will be on mainbr and will point back only to A; and for git rebase, the new commit will be on sidebr, with nothing obvious pointing to B at all, and we should draw this as:

...--o--*--o--o--A   <-- mainbr
         \        \
          B        B'  <-- sidebr

since mainbr will continue to point to commit A.

In the end, this looks a bit more like a merge than it does like a rebase. (However, I would be happier if it weren't called a "squash merge" at all.)

¹The method by which Git finds * is somewhat different: it's actually not a single commit, but rather just the last of a (usually) very large set of commits, namely, all those reachable from the <upstream> argument to git rebase. (Confusingly, a merge base can also be a set of commits, but it is a much more restricted set. Best not to dive into the graph theory yet. :-) )

²If we wanted it to stop, we could use --no-commit just as we do for regular, non-squash git merge. So why does it stop automatically?