Search code examples
gitgitlabgit-mergegit-revert

Merged Master into branch then committed and pushed changes to branch. How can this be undone without a force push?


I made a mistake.

I have a branch (A) that is branched off of Master. Master is ahead of A by quite a bit. I accidentally merged Master into A a few days ago and pushed. I noticed my mistake later that day, but wasn't sure how to fix it, so I attempted to add some feature flags to turn things off that shouldn't be enabled in A and pushed them. Later, I decided to try and revert the A branch to get rid of all of the Master commits. I went through all of the changes (about 100 files) and now A looks like it did before Master. My problem now, however, is that I can't merge A into Master without the merge trying to delete all of the changes that exist in Master. (ie, new files created in Master were removed in the revert for A, so now git wants to remove the files from Master if I try to merge A into Master.)

How can I fix my monumental screwup and just get back to where I can do maintenance patches on A and merge with Master accordingly so that future versions don't lose the patches?


Solution

  • The short answer to how can I undo a merge without force-push? is: you can't.

    The longer answer is you can't, but you don't need to either, provided you know what you are doing and how merge works; it's just sometimes more convenient to force-push, if you can convince all other users of whatever repository you are forcing this way.

    TL;DR: if you really need to revert a merge, do it; you can revert the revert later

    See How to revert a merge commit that's already pushed to remote branch? See also MK446's answer to the same question which is pretty much a copy-paste of Linus Torvald's description on reverting the revert of a merge.

    Understanding all of this (long)

    The key to understanding why this is the case, and what to do about it, is to realize that the "merge-ness" of any set of commits is inherent in the commits themselves. Branch names merely serve as ways to find the commits. The act of doing a force-push is a method by which you change where the name points so that people (and Gits) can no longer find some commit(s).

    It's easy to see once you get it, but I still do not know how to explain it properly, other than to convince people to draw graphs. Linus Torvalds has summarized it this way—which is accurate, but tricky:

    [While] reverting a merge commit ... undoes the data that the commit changed, ... it does absolutely nothing to the effects on history that the merge had. So the merge will still exist, and it will still be seen as joining the two branches together, and future merges will see that merge as the last shared state - and the revert that reverted the merge brought in will not affect that at all. So a "revert" undoes the data changes, but it's very much not an "undo" in the sense that it doesn't undo the effects of a commit on the repository history. So if you think of "revert" as "undo", then you're going to always miss this part of reverts. Yes, it undoes the data, but no, it doesn't undo history.

    "History" is the commit graph. The graph is determined by the commits, but we find the commits by branch names. So we can alter what we can see by changing the hash IDs stored in the names. But until you know, and see in your own head, how this works, this does not really help.

    You might spend some time looking over the tutorial at Think Like (a) Git, but for a fast review, consider these facts:

    • A Git commit consists of two parts: its main data, which is a snapshot of all of your files—we'll say little more here about this—and its metadata, which contains information about the commit itself. Most of the metadata is stuff for your own information later: who made the commit, when, and their log message telling you why they made that commit. But one item in the metadata is for Git itself, and that is a list of parent commit hash IDs.

    • Everything stored inside any Git commit—indeed, inside any Git object, but mostly you deal directly with commit objects—is totally read-only. The reason for this is that Git finds the object by a hash ID. Git has a big key-value database storing these objects; the keys are the hash IDs, and the values are the object's content. Each key uniquely identifies one object, and every commit is distinct,1 so every commit has a unique hash ID.2

    • Hence the hash ID of a commit is, in effect, the "true name" of that commit. Whenever we have that hash ID stored somewhere, e.g., in a file, or a row in a spreadsheet, or whatever, we say that this entry points to the commit.

    • The parent hash ID(s) stored in each commit therefore point to previous commits. Most commits have just one parent hash ID; what makes a commit a merge commit is that it has two or more parent hash IDs. Git makes sure that whenever anyone makes a new commit, the parent hash ID(s) listed in that commit are those of existing commits.3

    The result of all of this is that most ordinary commits point backwards in a simple linear fashion. If we draw a series of commits, replacing the real hash IDs with single uppercase letters, with newer commits towards the right, we get:

    ... <-F <-G <-H
    

    where H stands in for the hash ID of the last commit in the chain. Commit H points to (contains the raw hash ID of) its parent commit G; commit G points to earlier commit F; and so on.

    Because the hash IDs look pretty random,4 we need some way to find the last commit in the chain. The alternative is to look at every commit in the repository, build up all the chains, and use that to figure out which commit(s) are "last".5 That's much too slow: so Git gives us branch names. A branch name like master or dev simply points to one commit. Whatever commit the name points to, we decree that this is the tip commit of the branch. So given:

    ...--F--G--H   <-- master
    

    we say that commit H is the tip commit of branch master.6 We say that all these commits are contained in the branch master.

    More than one name can point to any one particular commit. If we have:

    ...--G--H   <-- dev, master
    

    then both names, dev and master, identify commit H as their branch-tip commit. Commits up through and including H are on both branches. We'll git checkout one of these names to start using commit H; if we then add a new commit, the new commit will have commit H as its parent. For instance, if we add a new commit while "on" branch master, the new commit will be commit I, which we might draw like this:

              I   <-- master (HEAD)
             /
    ...--G--H   <-- dev
    

    The special name HEAD can be attached to one branch name—just one at a time; it indicates which branch-name new commits update, as well as showing us which commit is our current commit and which branch name is our current branch.

    Adding another commit to master, then checking out dev, get us this:

              I--J   <-- master
             /
    ...--G--H   <-- dev (HEAD)
    

    The current commit is now rewound to H, and the current branch is dev.


    1That's one reason commits have date-and-time stamps. Even if two commits are otherwise identical, if they're made at different times, they have different timestamps and are therefore different commits. If you make the exact same commit twice at the exact same time, you only made one commit ... but if you did exactly the same thing many times at the exact same time, did you actually do many things, or only one thing? 😀

    2By the Pigeonhole Principle, if the space of "all commits" is larger than the space of "commit hash IDs"—and it is—there must be multiple different commits that resolve to the same hash ID. Git's answer to that is partly "you can't use those other commits" but also "so what, it never happens in practice". See also How does the newly found SHA-1 collision affect Git?

    3Failure to do this can result in a broken Git repository, with incorrect "connectivity". Whenever you see a Git message about "checking connectivity", Git is doing this kind of checking. Some new Git work is deliberately weakening these connectivity checks, but even if Git doesn't check sometimes, the rules are still there in principle, at least.

    4Of course, they're entirely deterministic—they're currently SHA-1 hashes—but they are sufficiently unpredictable to look random.

    5Both git fsck and git gc do just this, in order to figure out if there are some commits that can be discarded. The git fsck command will tell you about them—they're dangling and/or unreachable commits. The git gc command will remove them, provided other conditions are right. In particular, they need to have aged past an expiration time. This avoids having git gc delete a commit that's still being built. Commits and other objects can be unreachable simply because the Git command that's creating them isn't finished yet.

    6This leaves us with a conundrum of sorts: the word branch, in Git, is ambiguous. Does it mean branch name, or does it mean tip commit, or does it mean some set of commits ending with a specified commit? If it means the latter, does the specification have to be a branch name? The answer to this question is often just yes: the word branch can mean all of these, and perhaps more. See also What exactly do we mean by "branch"? So it's best to use a more-specific term whenever possible.


    Merging

    Now that we're on dev and commit H, we can add two more commits to produce:

              I--J   <-- master
             /
    ...--G--H
             \
              K--L   <-- dev (HEAD)
    

    At this point, we can git checkout master and then git merge dev. If commits are Git's raison d'être, Git's automatic merging is a significant reason we all use Git, rather than some other VCS.7 What git merge does is perform a three-way merge, combining a merge base snapshot with two tip commit snapshots.

    The merge base is determined entirely by the commit graph. It's easy to see in this particular graph, because the merge base is the best commit that's on both branches.8 So what git merge will do is:

    • compare the snapshot in the merge base commit H with the snapshot in our current branch tip commit, to see what we changed; and
    • compare the snapshot in the merge base commit H with the snapshot in their branch tip commit, to see what they changed,

    and then simply (or complicatedly, if necessary) combine these two sets of changes. The combined changes can now be applied to the base snapshot, i.e., the files as saved for all times in commit H.

    The result of combining the two changesets is either success—a new snapshot ready to go into a new commit—or a merge conflict. The conflict case occurs whenever Git can't combine our changes and their changes on its own. If that happens, Git stops in the middle of the merge, leaving a mess behind, and our job becomes clean up the mess and provide the correct final snapshot and then tell Git to continue: git merge --continue or git commit (both do the same thing).

    Having successfully combined changes—perhaps with our help—Git now makes a new commit. This new commit is just like any other commit, in that it has a snapshot for its data, and has some metadata giving our name and email address, the current date-and-time, and so on. But it's special in exactly one way: it has, as its parents (plural), the hash IDs of both of the two tip commits.

    As always with any commit, the act of making the commit updates the current branch name, so we can draw the result like this:

              I--J
             /    \
    ...--G--H      M   <-- master (HEAD)
             \    /
              K--L   <-- dev
    

    Remember that we started the process with git checkout master, so the current commit was J and the current branch name was, and still is, master. The current commit is now merge commit M, and its two parents are, in order, J—this first parent-ness of J can be used later if you wish—and L.


    7Many pre-Git VCSes had built-in merging, but not so many had merging that was as clever and automatic. There were and are other good version control systems, then and now, but Git also added distributed version control and, with GitHub and other sites, won the network effect. So now we're stuck with Git. 😀 Mercurial is pretty clearly better than Git in terms of user-friendliness, and Bitbucket used to be a Mercurial-only site, but now it ... isn't.

    8Here, we take the word branch to mean set of commits reachable from the current branch-tip. We know that the branch names will move around later: at some point in the future, master won't name commit J and/or dev won't name commit L, but right now they do. So we find commits reachable from J and working backwards, and commits reachable from L and working backwards, and when we do that, the obvious best commit that's on both branches is commit H.


    Sidebar: git merge doesn't always merge

    Under one particular (but common) condition, git merge won't make a merge commit unless you force it to do so. In particular, suppose the best shared commit on two branches is the last commit on the "behind" branch. That is, suppose we have:

    ...--o--B   <-- br1 (HEAD)
             \
              C--D   <-- br2
    

    where the parent of D is C, the parent of C is B, and so on. We have br1 checked out, as indicated by HEAD here. If we run git merge br2, Git will find commits B and D as usual, work backwards from D to C to B, and discover that the best shared commit—the best commit on both branches–is commit B, which is also the current commit.

    If we did a real merge at this point, Git would compare the snapshot in B vs the snapshot in B: base vs HEAD is B vs B. Obviously there are no changes here. Then Git would compare the snapshot in B vs that in D. Whatever these changes are, Git would apply these changes to the snapshot in B. The result is ... the snapshot in D.

    So if Git were to do a real merge at this point, it would produce:

    ...--o--B------M   <-- br1 (HEAD)
             \    /
              C--D   <-- br2
    

    where the snapshot in M would exactly match the snapshot in D.

    You can force Git to do a real merge using git merge --no-ff, but by default, Git will "cheat". It will say to itself: The merge snapshot would match D, so we can just make the name br1 point directly to commit D. So git merge will simply git checkout D, but also slide the name br1 "forward" to point to commit D:

    ...--o--B
             \
              C--D   <-- br1 (HEAD), br2
    

    If you use GitHub to do your merges, note that GitHub always forces a real merge so that you never get a fast-forward.9


    9The closest you can get is to use GitHub's rebase and merge mode, but this copies the commits that are otherwise fast-forward-merge-able. It gives them new committer name-and-email-and-time-stamps and the resulting commits have new hash IDs. So it's never a real fast-forward. This is sometimes annoying and I wish they had a real-fast-forward option.


    It's the existence of the merge commit itself that matters for future merges

    Suppose we've done this pattern for a while, and have:

    ...--o--o--o------A-----M   <-- master
             \       /     /  
              o--o--o--o--B--C--D   <-- dev
    

    Which commit is the merge base of master and dev? There's a big hint here: it is one of the lettered commits, rather than the more boring historical o commits.

    The tricky part is that to find a merge base, when we walk backwards from a branch tip commit, we should visit both parents simultaneously as it were. So merge commit M has two parents, A and B. Meanwhile, starting at D and working backwards, we also arrive at commit B (after two hops). So commit B is the merge base.

    The reason B is the merge base is the existence of merge commit M. The name master points to M and M points back to two commits, A and B. Commit B is "on" (contained in) branch master, and is clearly on/contained-in branch dev, and this will continue to be true as long as master points to a commit that either is commit M, or reaches (by some chain of commits) merge M.

    Git normally only ever adds commits to branches, sometimes one at a time by committing, and sometimes many at once by merging or fast-forwarding. Once commit B becomes "on" (contained in) branch master via commit M, it will continue to be on/contained-in master. A future merge might find a better commit than commit B, but as long as these commits continue to be on master and dev, commit B will always be a merge-base candidate.

    This is why you can't undo a merge easily

    So this is why you can't "undo a merge" without force-pushing. You can change the snapshots in new commits—that's what git revert is about, for instance—but you can't change the history of the existing commits. The history is the set of commits found by walking through the graph, and all existing commits are frozen for all time and remain in the graph as long as they can be found:

    ...--o--o--o------A-----M   <-- master
             \       /     /  
              o--o--o--o--B--C--D   <-- dev
    

    The history for master is commit M, then both commits A and B, then their parents and so on. The history for dev is commit D, then commit C, then commit B, and so on.

    To change the history as seen from master, you must convince Git to stop walking through commit M. If you use force-push to remove M from master—it still exists, it's just not findable via master any more—you get:

                        ------M   ???
                       /     /
    ...--o--o--o------A  <-- master
             \       /     /  
              o--o--o--o--B--C--D   <-- dev
    

    (Note that there's no name that finds M in this drawing, so eventually, git gc will discard commit M entirely. See also footnote 5.)

    Force-push is how we tell Git: Yes, this operation will render some commit(s) unreachable, and probably lose them forever. We mean for this to happen! By completely removing merge commit M, we get back into a state in which the merge never happened and commit B won't be the merge base next time.

    (Exercise: find the merge base.)