Search code examples
gitgit-rebase

Difference between git rebase and git reset


Say I have this:

A - B - C - E - F    [integration]
 \
  G - H - I          [feature]

after commit I, we rebase with with integration:

git fetch origin
git rebase integration

so now we have:

A - B - C - E - F    [integration]
 \
  B - C - E - F - G - H - I          [feature]

and then say we merge the feature branch into integration, we then have:

A - B - C - E - F - G - H - I   [integration]
 \
  B - C - E - F - G - H - I          [feature]

(I think that is right), but I don't see how that is any different than not rebasing at all?


Solution

  • Your drawing is misleading you.

    Remember these things:

    • A commit's "true name" is its hash ID.
    • Each commit stores the hash ID of its parent commit, or for a merge commit, all of its parents.
    • No commit can ever be changed at all, but commits can be copied to new replacements.

    If it helps, think of commits as big, solid things: bricks and beams making up a building, for instance. (Like oversized Lego bricks, each brick has some connector(s) to some other brick, and we plug our bricks together to make chains. These connections are via the hash IDs: they come out of the child commit and point towards the parent.)

    Branch names, on the other hand, are very light weight items. They're like sticky notes that you slap on a commit, then peel off later and slap on a different commit.

    So if you have this:

    A <-B <-C <-E <-F   <-- integration
     \
      G <-H <-I   <-- feature
    

    you cannot get your second picture, because existing commit G records A's hash ID. You cannot have a G that has F as its parent. You also should not draw a commit twice, if at all possible: commits are unique things, there's only one commit G.

    git cherry-pick is the building block for copying commits

    Often, we find ourselves in a situation in which we have a commit like G that's OK as it is, but we'd like it more if it were a little bit different. We'd like a new copy that's like G, but has F as its parent, and has a different source tree snapshot than the original G too. Let's call the new commit G' to distinguish it from G but remind us that it's a lot like G. We want the difference between F and G' to be the same as the difference between A and G, thereby accounting for any changes needed because of commits B-C-E-F too. So what we want is a commit graph that looks like this:

                  G'  <-- new-and-improved-feature
                 /
    A--B--C--E--F   <-- integration
     \
      G--H--I   <-- feature
    

    If we then copy commit H to a new and improved H', and copy I to a new and improved I', we get this:

                  G'-H'-I'  <-- new-and-improved-feature
                 /
    A--B--C--E--F   <-- integration
     \
      G--H--I   <-- feature
    

    git reset moves labels

    What git reset does—well, one of several things it can do, but this is what we do with it now—is to move the branch name sticky-labels.

    There is one sticky-label on which we wrote the word feature. That sticky-label is attached to commit I right now. But we just used git cherry-pick three times, to copy the G-H-I sequence to the G'-H'-I' sequence, in our new and improved setup.

    If we now have Git peel the label feature off commit I, and paste it onto commit I' instead, we get this:

                  G'-H'-I'  <-- feature (HEAD), new-and-improved-feature
                 /
    A--B--C--E--F   <-- integration
     \
      G--H--I   <-- ORIG_HEAD
    

    To make that happen, we run: git checkout feature; git reset new-and-improved-feature.

    The git reset command sets this special name ORIG_HEAD to remember where feature used to go. Now the label feature is attached to commit I', but there are ways to find I, including this ORIG_HEAD trick.

    (We no longer need the "new and improved feature" label, so we can delete it now.)

    Note that no commit has changed. The original G is still in the repository. Running git log ORIG_HEAD, we can still see it, at least until we do another Git command that uses ORIG_HEAD to remember some other commit. We'll see I, then H, then G. We can also use the reflog for feature to find the hash ID for commits G, H, and I. As long as we have the hash ID or a name for the hash ID, we can find the commit. (Those reflog entries eventually expire—they have a date stamp, and after a month or three, Git removes the reflog entry.)

    If we use the name feature, though, we find the new copies instead of the original commits. This makes it seem like the commits have changed, as long as we don't pay close attention and notice that in fact, these are new commits.

    The bottom line is this: After we copy the commits, if we use git reset to abandon the originals in favor of the new-and-improved copies, we'll see only the new-and-improved copies, and we can act like the commits have changed. The commits haven't changed, and if anyone really looks closely, they will discover our secret, but if someone else never knew about the originals they cannot discover that these are  cheap knock-offs  improved copies.

    git rebase = cherry-pick plus reset

    This brings us to the conclusion: git rebase is fundamentally git cherry-pick of some set of commits followed by git reset. That is, we start by copying commits to new and—we hope—improved versions; then we use git reset to attempt to trick everyone into using the improved commits in place of the originals.

    Anyone who still has the originals will not be fooled! If someone else—some other Git repository—still has the original commits, we must convince them to switch to the new improved commits too. But if we're the only ones with access to the commits, we only have to fool ourselves, which is probably easier.