Search code examples
gitrebase

Is it okay to rebase after having pushed commits in this instance?


I have a long-lived development branch off the master branch where all modifications are done until this branch is merged back to the master branch. Occasionally, however, a critical fix will be cherry-picked and applied to the master branch rather than waiting for the full merge. Modifications to the development branch are committed and pushed to a remote repository multiple times during the development cycle. When the merge back to the master is ultimately done, it is not unusual that a merge commit is created because of previous cherry-picking.

I know in general you should not rebase a branch that has commits that have been pushed to a remote repository that others are pulling from. But following the merge, the development and master branches are essentially identical except for having different heads. But if I rebase my development branch onto the master branch immediately following the merge, I believe the two branches will have a common head (the merge commit) and none of the commit ids in the development branch will change. By doing so, nobody gets hurt and I can do future merges without being automatically forced to create merge commits.

Is this reasonable?


Solution

  • TL;DR

    There was nothing to rebase, so your rebase itself was OK. It's not necessarily a good idea, nor a bad idea. Had there been something to rebase, everything gets more complicated.

    (Side note: There is a different, and generally better, way to solve the hotfix problem using git merge rather than git cherry-pick, though it has no bearing on your desire and ability to do this sort of rebase. It does have its own drawbacks as well. For more about these, see Stop cherry-picking, start merging. Be sure to read the coda: Stop merging if you need to cherry-pick.)

    Long: Three Key Takeaways

    I'm not sure who will read this long part, but whoever does, there are three key takeaways. The first one is the more complex rebase rule that's just below. The second is that fast-forward is ultimately about reachability, which has a whole web-site devoted to the idea, at Think Like (a) Git. This is worth reading. The last is that rebase works by copying commits, then abandoning the originals in favor of the new-and-improved copies. It's this abandonment, and the non-fast-forwarding that accompanies it—which is required to stop using the outdated commits—that brings in all the woes that result in the simple don't rebase shared branches rule that's often a little too simple.

    (There are some others, including one at the end, that are generally less important except to repository historians.)

    Rebasing shared branches

    I know in general you should not rebase a branch that has commits that have been pushed to a remote repository that others are pulling from.

    That's the simple rule. There's a more complicated variant that says rebase is OK as long as all users who use / will-be-using / are-using this branch are OK with it.

    Defining terms and the gitglossary

    But following the merge, the development and master branches are essentially identical except for having different heads. But if I rebase my development branch onto the master branch immediately following the merge, I believe the two branches will have a common head (the merge commit)

    That can be true, but only in trivial cases like the one you encountered. Moreover, there are some terminology issues here. In particular, we must define "head". If do it the way the gitglossary does, we need a different tern: we need to start using tip commit instead. Here are their definitions of head and branch, and indirectly, of tip commit:

    head

    A named reference to the commit at the tip of a branch. Heads are stored in a file in $GIT_DIR/refs/heads/ directory, except when using packed refs. (See git-pack-refs[1].)

    branch

    A "branch" is an active line of development. The most recent commit on a branch is referred to as the tip of that branch. The tip of the branch is referenced by a branch head, which moves forward as additional development is done on the branch. A single Git repository can track an arbitrary number of branches, but your working tree is associated with just one of them (the "current" or "checked out" branch), and HEAD points to that branch.

    Note, by the way, that HEAD (literal and all uppercase) is very different from "head" (all lowercase). This distinction gets blurred or even lost on case-folding systems like Windows and MacOS, but is otherwise crucial: there's only one HEAD, but each branch name is a "head".

    and none of the commit ids in the development branch will change

    In most cases, all commits that are exclusive to development—that are not already in the range reachable from the updated masterwill be copied and all the new commits that result from this copying will have different hash IDs. If the list of these commits is empty, this copying process will copy zero commits, and all zero of those will have new hash IDs, but since there are zero of them, that doesn't matter. :-)

    Representing this visually, on screen or paper or whiteboard

    To understand what the above definition of head and branch means‚ start with a simple drawing of what a series of commits—loosely, "a branch"—looks like in Git. We know that:

    • Each commit saves a full snapshot of your code.

    • Git finds commits (well, objects in general, including commits) by hash ID. A hash ID is a big ugly string such as 7ad088c9a811670756a3fb60ac2dab16b520797b.

    • Each commit has its own unique hash ID.1

    • Each commit stores the hash IDs of its parent (if the commit is an ordinary commit) or parents (at least two, usually exactly two, if the commit is a merge commit).

    • The contents of any commit, once made, can never be changed. (In fact, no Git object can ever change once made.2)

    Therefore, if we start with the latest commit, we can have Git follow each of these parents, one at a time, backwards:

    ... <-F <-G <-H   <--latest
    

    We just need to store the raw hash ID of the latest commit somewhere, so that Git can look up hash H and use that to find hash G to look up the commit to find hash F, and so on back through time. (Eventually, Git will reach the very first commit, which has no parent, because it can't have one, and that lets Git stop.)

    For drawing purposes, since the contents of the commits can't change, we can just connect them with lines, as long as we remember that internally, we can only go backwards (from newer commits to older ones). New commits remember their parent, but existing commits can't have their children added to them when the children get created, because it's too late: the parent is frozen for all time by then. So let's draw a slightly more complicated graph:

    ...--G--H   <-- master
             \
              I--J--K   <-- develop
    

    Here, the parent of commit I is commit H. The name master contains the raw hash ID H itself; that's how Git can git checkout master. The name development contains the raw hash ID K. These are the latest commits—"heads" or branch tips, using the definitions that gitglossary uses.


    1Git makes sure of this by adding, to each commit, the date-and-time-stamp, so that even if you force Git to re-commit the exact same stuff you had just a minute ago, re-using your name and email address and log message—and the same parent hash—the time stamp is different. This does mean you literally can't force Git to make more than one commit per second if you change nothing else, but that's a limitation I, for one, am prepared to live with. :-)

    2This is a consequence of the fact that the hash ID of a Git object is, literally, a cryptographic checksum of the data contents of that object. This serves two purposes: it makes it easy to look up the actual data, given a summary checksum; and, it makes it possible to detect data damage, because changing just a single bit of the data results in a new, different checksum.


    The glossary does not match most humans' ordinary everyday word usage

    Gitglossary tries to use the name head for a branch name itself, the word branch to mean the tip commit of the branch plus some or all of the commits behind that tip commit, and tip commit for the commits H and K. Users generally conflate these, lumping all three under the word branch. They may even use that same word—"branch"—to refer to names such as origin/master and/or the commits reachable from such a name. Gitglossary tries to call that a remote-tracking branch. I find that this term causes confusion, and have been using remote-tracking name instead, but am not sure it's much of an improvement.

    For reference below, my own terms are: branch name for a name like master, remote-tracking name for a name like origin/master, tip commit used in exactly the same way as the glossary, and DAGlet for a collection of commits and their linkages, usually found by picking the last commit and working backwards.

    Adding commits

    In the end, it doesn't matter what we call these, as long as we all understand what each other is talking about. Unfortunately, in practice, people have trouble with the last part. So let's illustrate the process of adding a new commit.

    For Git, it's the commit hash IDs, which I am drawing as single uppercase letters here, that really matter. The namesmaster, develop, origin/master, and so on—are just things humans use to keep track of the hash IDs. Git gives us the ability to update these names, so that they hold the latest hash ID, automatically. We'll start with this:

    ...--G--H   <-- master
             \
              I--J--K   <-- develop
    

    Now we do work, leading to a git commit or a git cherry-pick. We start with:

    git checkout master
    

    to select name master and commit H, and to achieve this, Git attaches HEAD (all uppercase) to master:

    ...--G--H   <-- master (HEAD)
             \
              I--J--K   <-- develop
    

    and at the same time extracts commit H into the index and work-tree so that we can work on / with it.

    Now we do some work and run git commit, or run git cherry-pick something, for instance. The act of running git commit, or other Git commands that make new commits, makes Git update the name so that it now holds the latest commit hash ID. Our new commit will take the next letter L (or in reality, acquire some big ugly hash ID) and we will have this:

    ...--G--H--L   <-- master (HEAD)
             \
              I--J--K   <-- develop
    

    Adding merge commits

    Remember now that Git works backwards, one commit at a time. If we start at K, we'll visit commits K, then J, then I and H and G and so on, skipping L. If we start at L, we'll visit L, then H and G and so on, skipping the entire I-J-K chain. So the only way to encompass both is to work backwards from some new commit that uses both as its parents. That's a merge commit, which we can make by running git merge develop:

    ...--G--H--L------M   <-- master (HEAD)
             \       /
              I--J--K   <-- develop
    

    The merge commit M has two parents. The most-distinguished one is its first parent, which is L, because L was the HEAD commit at the time we ran git merge. This means that if we use the DAGlet we get by starting at M, or any later commit that gets us to M, and working backwards, we'll skip the commits that came in from develop here. That is often precisely what we want: all the commits that were made by working directly on master.

    Fast-forward operations

    In Git, at any time, we can point any branch name we like—new or existing—at any commit that currently exists. So now that we have:

    ...--G--H--L------M   <-- master (HEAD)
             \       /
              I--J--K   <-- develop
    

    we can create a new name, such as zorg, to point to commit L or H or J or whatever we like, for whatever reason. Let's pick commit J for no particularly good reason, and make it HEAD as well by doing git checkout zorg during or after we create zorg:

    ...--G--H--L------M   <-- master
             \       /
              I--J--K   <-- develop
                  .
                   .....<-- zorg (HEAD)
    

    Which commits do we get if we start with zorg and work backwards? Since zorg picks J, which points back to I and then H and so on, we get ...--G--H--I--J.

    Now let's forcibly move zorg to point to L instead, updating our index and work-tree at the same time, using git reset --hard <hash-of-L>. Now we have:

                ..........<-- zorg (HEAD)
               .
    ...--G--H--L------M   <-- master
             \       /
              I--J--K   <-- develop
    

    Which commits do we get if we start at zorg and work backwards? Obviously, the sequence ...--G--H--L. Note that commit J is no longer reachable from zorg.

    Now let's make zorg point to commit M, just like master does:

    ...--G--H--L------M   <-- master, zorg (HEAD)
             \       /
              I--J--K   <-- develop
    

    Which commits are reachable now? Let's let Git follow both parents of M, so that we get ...--G-H-(L and I-J-K)-M. So for this particular move, whether we made it from either L or from J, we'd still be able to reach all the commits that we could before, plus some new ones.

    Fast-forwards apply to push and fetch as well

    In graph terms, commits L and J are both ancestors of commit M. This means that moving the label zorg forward—in the direction that's hard for Git on its own—from any of these ancestors, to M, is what Git will call a fast-forward operation. The glossary (incorrectly in my opinion) defines this term for git merge, but it applies to more than just git merge. It's not really a property of git merge at all, but rather of the label motion itself.

    The earlier move from J to L was not a fast-forward, because J is not an ancestor of L. In fact, neither commit is an ancestor of the other, so any move from J to L or vice-versa is a non-fast-forward operation. Fast-forwards occur when the move goes from a commit to one of its descendants. (Because that's hard for Git to test, it actually checks the other way around: You already gave it the descendant commit, so Git works backwards to see if it finds the parent from there.)

    In particular, suppose, after we made zorg point to J, we ran:

    git push origin zorg
    

    This would have our Git call up the other Git at origin and ask them to create their own branch named zorg, pointing to commit J.3 Since this is a new name for them, they would say OK and just do it.

    Now we'll do our git reset --hard locally to force zorg to point to L, and try the git push again. This time, they do have a zorg, and theirs identifies commit J. Commit L is not a descendant of J so this git push would fail, with a non-fast-forward error. We'd have to use git push --force to make them take our request—now a command—that they move their zorg in this non-fast-forward manner.

    But, whether or not we do this second push, if we move our zorg to point to M and then run:

    git push origin zorg
    

    again, this time, they'll happily accept the request. That's because this move, from either J or L, to M, is a fast-forward operation. So they will end up with their zorg pointing to commit M, matching our own situation.


    3If origin did not already have commit J, our Git would send them J and any necessary parent-commits as well.


    Cherry-pick and rebase

    The git cherry-pick command is fundamentally about copying a commit. Unfortunately, a commit is a snapshot, and when we copy one, we don't just want to take that snapshot. One classic example is some hotfix, that might be as simple as fixing a spelling error or removing a naughty word or something. We want to see that as a change, rather than as the version of the code where the fix was first made.

    So git cherry-pick essentially turns a commit into a set of changes, by running git diff between the parent of that commit, and that commit itself.4 Once we have the changes, we can apply them to some other commit, somewhere else in the full collection of commits, to make a new and different commit, like our commit L above. We'll have Git copy the log message of the cherry-picked commit, but the hash ID of the new commit will be different.

    Suppose we stop before we do any merges, i.e., when we still have this:

    ...--G--H--L   <-- master
             \
              I--J--K   <-- develop (HEAD)
    

    If we run git rebase master now, Git will first list out the commits reachable from HEAD—i.e., ...-G-H-I-J-K—then subtract the set reachable from master, ...-G-H-L, leaving the set I-J-K. It will then proceed to copy I to a new-and-improved I', as if by git cherry-pick, with I' going after L:

                 I'  <-- HEAD
                /
    ...--G--H--L   <-- master
             \
              I--J--K   <-- develop
    

    (This happens in "detached HEAD" mode, which is why HEAD points directly to new commit I'.) Then it repeats for J and K:

                 I'-J'-K'  <-- HEAD
                /
    ...--G--H--L   <-- master
             \
              I--J--K   <-- develop
    

    As its final trick, git rebase forces the name develop to move so that it points to the final copied commit, in this case, K', and re-attaches HEAD to the moved develop:

                 I'-J'-K'  <-- develop (HEAD)
                /
    ...--G--H--L   <-- master
             \
              I--J--K   [abandoned]
    

    Note that in this case, the motion was a non-fast-forward. If the origin Git has a develop that points to K, and we now try to send K' (and parents) to origin and ask them to set their develop to point to K', they will refuse with a non-fast-forward error.


    4The actual mechanism for git cherry-pick is to use a merge. The merge's base commit is the parent of the commit being cherry-picked, so we really do get this diff, but we also get a second diff, against HEAD, followed by a full three-way merge. This merge is concluded by making an ordinary, non-merge commit: that is, cherry-pick does the verb part of git merge, to merge, but not the noun part, because it just makes an ordinary (non-merge) commit.

    Except for tricky cases, though, you can just think of this as apply the parent-vs-child diff as if it were a patch. And in fact, some kinds of git rebase do the latter, while other kinds of git rebase use git cherry-pick internally! There's no particularly good reason for this: just historical accident, because git cherry-pick was originally implemented without using a proper three-way merge. When this was found to be inadequate for the tricky cases, git cherry-pick itself was improved, but the old git rebase continued to use the old way. All the newer git rebases use the new cherry-pick (because it's almost always the-same-or-better), but for backwards compatibility, the oldest form of rebase still uses the old way.


    If we merge first, it's a fast-forward!

    But suppose we wait, and let merge commit M go in first, so that we start instead with this:

    ...--G--H--L------M   <-- master (HEAD)
             \       /
              I--J--K   <-- develop
    

    Then we do:

    git checkout develop
    git rebase master
    

    This time, when Git lists the commits that are on develop that are not reachable from master, there are none. From M, git reaches K via its second-parent, so master has all the commits already. The rebase operation therefore starts by copying no commits, putting all zero of them after M:

    ...--G--H--L------M   <-- master, HEAD
             \       /
              I--J--K   <-- develop
    

    The last act of git rebase is to force the name develop up to the last copied commit, which in this case really means M, and re-attach HEAD:

    ...--G--H--L------M   <-- master, develop (HEAD)
             \       /
              I--J--K
    

    We'd get exactly the same effect if we ran git checkout develop; git merge master: Git would move the name develop forward, in a fast-forward operation, so that develop points to commit M. We can now git push origin develop because their develop is at K and moving to M is a fast-forward, which is allowed.

    If we now make new commits on develop, they'll look like this:

    ...--G--H--L------M   <-- master
             \       / \
              I--J--K   N--O   <-- develop (HEAD)
    

    which is of course fine. However, if we don't do the merge, that's fine too:

    ...--G--H--L------M   <-- master
             \       /
              I--J--K--N--O   <-- develop (HEAD)
    

    The difference between the two approaches

    The key difference here is that if we don't fast-forward develop, the parent of N is K instead of M, which means that we can follow the history of develop linearly from O to N to K to J and so on. With the merge in place, we need to know to go O to N to M, then down the second parent of M (ignoring the first) to K and J and so on.

    If you're going to do a lot of checking on history—perhaps for bug-hunting-and-fixing, perhaps just out of historical interest—the straight-line, never-fast-forwarded method gives you the advantage that you can use --first-parent (the Git flag that says *at merges, only follow the first parent) to make your future job easy. If you're never going to do that, this difference makes no difference at all.

    There's one more alternative, that has relatively little usefulness but is worth considering. Suppose, after making merge M, you make a true merge on develop like this:

    git checkout develop
    git merge --no-ff master
    

    What you get here is a commit N that we can draw like this:

    ...--G--H--L------M   <-- master
             \       / \
              I--J--K---N   <-- develop (HEAD)
    

    where the first parent of N is K, and the second is M. The hash IDs of the two commits will differ, while the saved snapshots of each should be the same.5 This means you can do the same history-searching trick as before, when you didn't merge at all, but you also show that future development on develop starts out with the same code as the main line master.

    (In practice, there's little need for this—just pick one of the other two methods—but that's what you get if you do it.)


    5I say should be because it's possible for any human operating Git to force some sort of difference in here. It's usually a bad idea to do that, though. Just keep that, too, in mind if you're inspecting some foreign Git repository that you know little about: if you see this pattern, you can compare the trees for merges M and N to see if someone did something weird.