git rebase git-rebase cherry-pick git-cherry-pick

Is it correct to say that a git rebase is equivalent to a git cherry-pick of certain commits from the other direction?

I'm trying to tighten up my understanding (and communication) of git commands.

Is it correct to say that

git checkout A
git rebase B

is exactly equivalent to

git checkout B
git cherry-pick <all_commits_from_common_ancestor_of_<A>_and_<B>_to_<A>>

And if not, in what scenario do they diverge?

Solution

It's not quite correct.

There are several stumbling blocks here. First, there may not be a common ancestor, or there might be more than one. (This is pretty minor: that just means all commits get copied, or all common ancestors get omitted.) Second, we might not check out the commit specified by B here. Third, some commits can be omitted, and depending on the form of rebase, this can get a bit complicated. Last, the copying happens in detached HEAD mode, and afterward, rebase yanks the branch name around (as if via git checkout -B or git switch -C, or git branch -f followed by git checkout or git switch).

The actual commits that are to be enumerated depend on the upstream argument to rebase, which can be specified this way:

git rebase --onto <target> <upstream>

or:

git rebase <upstream>

If the --onto <target> option is left out, the target is the same as the upstream. This is the commit that gets checked out (in detached-HEAD mode).

The rebase documentation first suggests that commits to be enumerated are those in:

upstream..HEAD

(before the checkout step of course, as that moves HEAD). This is not quite true, so the current documentation immediately corrects itself a bit:

This is the same set of commits that would be shown by git log <upstream>..HEAD; or by git log 'fork_point'..HEAD, if --fork-point is active (see the description on --fork-point below); or by git log HEAD, if the --root option is specified.

Just a bit later, it adds this:

Note that any commits in HEAD which introduce the same textual changes as a commit in HEAD..<upstream> are omitted (i.e., a patch already accepted upstream with a different commit message or timestamp will be skipped).

It's not until much later that it mentions that merge commits are omitted entirely, unless you use the (now deprecated) -p option or the (new in Git 2.18) -r option.

What's really going on here, though, is that Git is using the three-dot syntax for git rev-list, with the --left-right mode.¹ The base three-dot syntax:

git rev-list upstream...HEAD

enumerates all commits that are reachable from either commit, but not reachable from both commits. In graph theoretical terms, this is the symmetric difference. It's described briefly in the gitrevisions documentation. This forces the revision-walking code to examine commits reachable from HEAD but not upstream and commits reachable from upstream but not HEAD. While it's doing that, Git performs a git patch-id on each commit. This lets git rebase locate commits that are "the same" (in terms of what they change), and thus omit them if they've already been cherry-picked to the upstream branch.

In particular, then, suppose you have:

...--o--*--D--E--B'--F   <-- their-branch
         \
          A--B--C   <-- your-branch (HEAD)

and you run git rebase their-branch to copy your three A-B-C commits to come after F. The rebase code will compute the patch-IDs from A, B, and C, and also the patch-IDs from D, E, B', and F. Given that commit B' is a copy of your commit B, it probably² has the same patch-ID. So Git will omit B from the list of commits to copy.

The --fork-point mode is described somewhat obliquely, but you should first note that --fork-point is the default option in some cases, and --no-fork-point is the default option in others. The way fork-point mode works is to use your reflogs. See Git rebase - commit select in fork-point mode for more about this.

There is a relatively new --keep-base option that really does a merge base computation. You can invoke it directly with the three-dot syntax, or use the --keep-base option to turn it on.

Finally, the omission of merge commits (except with -r, or the -p option that you should avoid) occurs because Git literally can't copy a merge. The omission of merges is basically the same as using the --no-merges option when running git rev-list. When you use the -r option, Git will enumerate the merges and make note of them, and will use the fancier new interactive scripting mode to re-perform the merges. That is, given a graph fragment like this:

...--o--*-------F   <-- their-branch
         \
          \   B
           \ / \
            A   D--E   <-- your-branch (HEAD)
             \ /
              C

a git rebase -r will produce:

                      B'
                     / \
                    A'  G--E'   <-- your-branch (HEAD)
                   / \ /
                  /   C'
                 /
...--o--*-------F   <-- their-branch
         \
          \   B
           \ / \
            A   D--E   [abandoned]
             \ /
              C

where new merge commit G is produced by literally running git merge on commits B' and C'. If you made D as an evil merge using git merge --no-commit, the evilness will be lost during this re-merging. The remaining commits, marked with the prime suffix (A' etc), are done by copying, using the underlying cherry-pick machinery.³

¹In the old days, git rebase was several shell scripts, and one of them really did run git rev-list like this, although it used --right-only --cherry-pick, as I recall. It's been rewritten in C since then and now it's ... more complicated. :-)

²Whether it has the same patch-ID depends on whether someone had to modify it while copying it. See the git patch-id documentation for details.

³In older versions of Git, the default is actually to use git format-patch and git am, or the internal equivalent. This is physically unable to copy merges either, and misses some rename cases that cherry-pick detects. During the addition of the new -r option, everything was set up to switch the default to use cherry-pick, and fairly recently (2.25ish?), that became the new default.