I'm trying to tighten up my understanding (and communication) of git commands.
Is it correct to say that
git checkout A
git rebase B
is exactly equivalent to
git checkout B
git cherry-pick <all_commits_from_common_ancestor_of_<A>_and_<B>_to_<A>>
And if not, in what scenario do they diverge?
It's not quite correct.
There are several stumbling blocks here. First, there may not be a common ancestor, or there might be more than one. (This is pretty minor: that just means all commits get copied, or all common ancestors get omitted.) Second, we might not check out the commit specified by B
here. Third, some commits can be omitted, and depending on the form of rebase, this can get a bit complicated. Last, the copying happens in detached HEAD mode, and afterward, rebase yanks the branch name around (as if via git checkout -B
or git switch -C
, or git branch -f
followed by git checkout
or git switch
).
The actual commits that are to be enumerated depend on the upstream
argument to rebase, which can be specified this way:
git rebase --onto <target> <upstream>
or:
git rebase <upstream>
If the --onto <target>
option is left out, the target
is the same as the upstream
. This is the commit that gets checked out (in detached-HEAD mode).
The rebase documentation first suggests that commits to be enumerated are those in:
upstream..HEAD
(before the checkout step of course, as that moves HEAD
). This is not quite true, so the current documentation immediately corrects itself a bit:
This is the same set of commits that would be shown by
git log <upstream>..HEAD
; or bygit log 'fork_point'..HEAD
, if--fork-point
is active (see the description on--fork-point
below); or bygit log HEAD
, if the--root
option is specified.
Just a bit later, it adds this:
Note that any commits in HEAD which introduce the same textual changes as a commit in HEAD..<upstream> are omitted (i.e., a patch already accepted upstream with a different commit message or timestamp will be skipped).
It's not until much later that it mentions that merge commits are omitted entirely, unless you use the (now deprecated) -p
option or the (new in Git 2.18) -r
option.
What's really going on here, though, is that Git is using the three-dot syntax for git rev-list
, with the --left-right
mode.1 The base three-dot syntax:
git rev-list upstream...HEAD
enumerates all commits that are reachable from either commit, but not reachable from both commits. In graph theoretical terms, this is the symmetric difference. It's described briefly in the gitrevisions documentation. This forces the revision-walking code to examine commits reachable from HEAD
but not upstream
and commits reachable from upstream
but not HEAD
. While it's doing that, Git performs a git patch-id
on each commit. This lets git rebase
locate commits that are "the same" (in terms of what they change), and thus omit them if they've already been cherry-picked to the upstream branch.
In particular, then, suppose you have:
...--o--*--D--E--B'--F <-- their-branch
\
A--B--C <-- your-branch (HEAD)
and you run git rebase their-branch
to copy your three A-B-C
commits to come after F
. The rebase code will compute the patch-IDs from A
, B
, and C
, and also the patch-IDs from D
, E
, B'
, and F
. Given that commit B'
is a copy of your commit B
, it probably2 has the same patch-ID. So Git will omit B
from the list of commits to copy.
The --fork-point
mode is described somewhat obliquely, but you should first note that --fork-point
is the default option in some cases, and --no-fork-point
is the default option in others. The way fork-point mode works is to use your reflogs. See Git rebase - commit select in fork-point mode for more about this.
There is a relatively new --keep-base
option that really does a merge base computation. You can invoke it directly with the three-dot syntax, or use the --keep-base
option to turn it on.
Finally, the omission of merge commits (except with -r
, or the -p
option that you should avoid) occurs because Git literally can't copy a merge. The omission of merges is basically the same as using the --no-merges
option when running git rev-list
. When you use the -r
option, Git will enumerate the merges and make note of them, and will use the fancier new interactive scripting mode to re-perform the merges. That is, given a graph fragment like this:
...--o--*-------F <-- their-branch
\
\ B
\ / \
A D--E <-- your-branch (HEAD)
\ /
C
a git rebase -r
will produce:
B'
/ \
A' G--E' <-- your-branch (HEAD)
/ \ /
/ C'
/
...--o--*-------F <-- their-branch
\
\ B
\ / \
A D--E [abandoned]
\ /
C
where new merge commit G
is produced by literally running git merge
on commits B'
and C'
. If you made D
as an evil merge using git merge --no-commit
, the evilness will be lost during this re-merging. The remaining commits, marked with the prime suffix (A'
etc), are done by copying, using the underlying cherry-pick machinery.3
1In the old days, git rebase
was several shell scripts, and one of them really did run git rev-list
like this, although it used --right-only --cherry-pick
, as I recall. It's been rewritten in C since then and now it's ... more complicated. :-)
2Whether it has the same patch-ID depends on whether someone had to modify it while copying it. See the git patch-id
documentation for details.
3In older versions of Git, the default is actually to use git format-patch
and git am
, or the internal equivalent. This is physically unable to copy merges either, and misses some rename cases that cherry-pick detects. During the addition of the new -r
option, everything was set up to switch the default to use cherry-pick, and fairly recently (2.25ish?), that became the new default.