Why do rebased commit ids differ from cherry-picked ids?

This question stems from a nasty little merge conflict that I got myself into when I accidentally cherry-picked from my tracked branch into my tracking branch as opposed to rebasing it. Fixing it was quite easy, but still trying to wrap my head around why it was an issued in the first place.

Let's say I have the below branches(tracking is based off of tracked) with series of commits with hash in parentheses, and arrows pointing to parent commits.

tracked: a(123) <- b(234) <- c(345)

tracking: a(123) <- b(234) <- c(345)

Let's say a new commit d with commit id 456 gets into tracked so that the state of branches are as below:

tracked: a(123) <- b(234) <- c(345) <- d(456)

tracking: a(123) <- b(234) <- c(345)

I now cherry-pick 456 onto tracking leading to the below state of tracking:

tracking: a(123) <- b(234) <- c(345) <- d(somethingnot456)

However, if I was to just perform a git rebase tracked it would have had been:

tracking: a(123) <- b(234) <- c(345) <- d(456)

So why do the ids differ above?

I have seen many questions about rebase vs cherry-pick, but I haven't managed to come across an answer for this specific question. Thanks.

Solution

Rebase and (repeated) cherry-pick are essentially the same thing, but they're not 100% exactly the same thing. In this particular case, the key is what gets copied, which is, well, nothing at all.

Let me redraw your example the way I prefer to express Git graph fragments. Instead of:

tracked: a(123) <- b(234) <- c(345)

tracking: a(123) <- b(234) <- c(345)

let's draw this as:

A(123) <- B(234) <- C(345)   <-- tracking, tracked

because, after all, each commit is unique: there's only one copy of A, one copy of B, one of C, and soon to be one of D. Meanwhile the two labels (tracking and tracked) both point to commit C, whose hash is 345whatever.

Now you add your new commit D(456) to tracked (so tracking still points to C(345):

A(123) <- B(234) <- C(345)          <-- tracking
                          \
                           D(456)   <-- tracked

Cherry-pick always copies

What git cherry-pick <commit> does is, in essence:

diff the given commit against its parent (so, D vs C)
on the current branch (tracking), apply the same changes, and
make a new commit with the same message, but different ID.

This is of course just what you've seen before. Your current branch (tracking) acquires new commit D': a copy of D, but with a different number.

Rebase finds which commits need to be copied

Rebase, on the other hand, works by getting a list of all the commits that your current branch (tracking) has, that your <upstream> branch (tracked) does not. Specifically these are the commits that git rev-list will list:

$ git rev-list tracked..tracking
$

There are no such commits, which is easy to see from the drawing. We don't even need the hashes:

A <- B <- C     <-- tracking
           \
            D   <-- tracked

Starting from tracking, we work our way leftward following the arrows marking commits, but then starting from tracked we work our way leftward again following the arrows and unmarking commits. Since D leads back to C, this unmarks everything and we copy nothing at all.

If we had a commit on tracking that wasn't on tracked:

A--B--C--E   <-- tracking
       \
        D    <-- tracked

then rebase would copy E, making a new (different ID) commit E'. The copy of E would go after D, like this:

A--B--C--E   <-- tracking
       \
        D    <-- tracked
         \
          E' [rebase in progress]

Then, rebase moves the branch label

Once git rebase is done with all its copying, it takes note of where it stopped—at D, if there was nothing to copy; at E', or maybe even F' or G' or whatever if there were commits to copy—and then it peels the old branch label (tracking) off and pastes it on the new point:

A--B--C--E   [abandoned]
       \
        D    <-- tracked
         \
          E' <-- tracking

When there's no E to copy we get this instead:

A--B--C
       \
        D   <-- tracked, tracking

i.e., both branch labels now point to commit D, which was not copied at all. (There's no reason to keep the little downward leg in the graph either, and there's no commit(s) to abandon—abandoning E does not abandon C, because C is find-able from D.)