Search code examples
gitcommitrebase

How to remove specific commit from already rebased branch?


I've opened a PR feature/orphans for develop branch. Since then, a lot of important branches has been added to the repo, so I should rebased onto them.

Now I need to remove one specific commit from my feature branch, and it gets a little tricky for me here. I tried to follow couple answers from the similar question (4110978 and 51400593) but it does not working somehow 🤷‍♂️.

I tried this:

git rebase --onto feature/db-optimization-for-orphans~fcf0c4a feature/db-optimization-for-orphans~2505060 feature/db-optimization-for-orphans

fatal: invalid upstream 'feature/db-optimization-for-orphans~2505060'

I want to remove this a commit #fcf0c4a, here is the log:

* cf83304 - (HEAD -> feature/orphans, origin/feature/orphans) Added orphans collection
* 844eb17 - fix: bugs
* 4f0111f - fix message
* 9093c8d - fix(cat-172): Change data format from object to array
* fcf0c4a - feat(CAT-172): Add new publisher // <---- đź‘‹ I WANT TO REMOVE THIS COMMIT
*   2505060 - (origin/develop, develop) Merge branch 'main' into develop
|\
| * 9600c29 - feat(CAT-111) description
* | 878ee2f - feat(cat-196) description
* | 6b53545 - feat(CAT-18) description
* | 1d2ef2e - saving model name into db + test + version
* | 24eea8e - saving ean values into the db + tests
|/
* 10c205c - Add preprod and prod deployers
* 8a8d0f2 - squashed commits
* 15328b6 - initial commit

Could somebody tell me what is the best solution to remove fcf0c4a commit ?


Solution

  • TL;DR

    I think you want git rebase -i origin/develop.

    Long

    I have bad news and good news for you.

    Here is the bad news

    You can only remove a commit from the end of a branch:

    * cf83304 - (HEAD -> feature/orphans, origin/feature/orphans) Added orphans collection
    * 844eb17 - fix: bugs
    * 4f0111f - fix message
    

    for instance has three commits at the end, whose names I'll shorten to just the first "digit" (c for the last one):

    ... <-4 <-8 <-c
    

    Commit cf83304 points backwards to commit 844eb17. That's how git log finds 844eb17.

    Git finds cf83304 in the first place because a branch name finds it:

    cf83304 ... feature/orphans
    

    That is, the name feature/orphans points to cf83304.

    To remove a commit from a branch, we make the branch name point to some earlier commit:

                c   <-- origin/feature/orphans
               /
    ... <-4 <-8   <-- feature/orphans
    

    So commit cf83304 is now shoved up aside. (Git can still find cf83304, because the name origin/feature/orphans still points to cf83304.)

    Once we've removed commit c... from the branch, we can remove commit 8... as well:

            8 <-c   <-- origin/feature/orphans
           /
    ... <-4   <-- feature/orphans
    

    and so on.

    So the bad news is this: to remove commit fcf0c4a—which you can do—you must also remove all the subsequent commits from that branch.

    Here is the good news

    Before we "remove" the commits—they don't really go away; if you know their numbers, or there is some other name for them such as origin/feature/orphans, we can still find them—we can copy selected commits to new and improved commits.

    Every commit, in Git:

    • Is numbered. Those big ugly random-looking hexadecimal numbers, fcf0c4a and so on, uniquely locate that one particular commit. Git needs that number to find the commit.

    • Contains two things:

      • Each commit has a full snapshot of every file as it appeared at the time you (or whoever) made the commit. These files are kept in a special, compressed and de-duplicated, read-only and Git-only form: you can't read them, and nothing—not even Git itself—can overwrite them, so these aren't the files you work with. They're just stored forever so that you can get them back later.

      • Each commit contains some metadata, or information about the commit itself. This includes the name and email address of the person who made the commit, for instance. It includes some date-and-time stamps: when you run git log and see the commit with an author and date, these all comes from the metadata. It includes the log message you see here too.

        Crucially for Git itself, the metadata in each commit stores the raw hash ID(s)—the commit number(s)—of some list of previous commits. Most commits store exactly one hash ID, which is what you're getting here; a few, like 2505060, are merge commits, which store two hash IDs; and at least one commit in every non-empty repository is the first ever commit, like 15328b6. This first-ever commit doesn't store any previous-commit ID, because there isn't any previous commit.

    Except for those oddball special cases (the merge, and the first commit), then, we can draw out a sequence of commits like this:

    ... <-F <-G <-H
    

    where each uppercase letter like H stands in for some actual hash ID. (This is what I did above, except that I was able to use the first character of the real hash IDs.) The letters represent the saved-files-and-metadata, and the arrows coming out of each letter represent the stored previous-commit hash ID: commit H stores the hash ID of earlier commit G. We say that H points to G.

    Now, picking up on the bad-news-good-news theme again, the bad news is that no commit can ever be changed after it's made. (This is necessary for a bunch of reasons, including Git's rather magical hash ID scheme—this is the similar to the tricks that power cryptocurrencies—and also including the fact that commits share identical files. If we could change one of these files somehow, that would change all the shared copies.) The good news is that the way we—and Git—find these commits is through branch and other names, and we can and do stuff different commit hash IDs into these branch and other names.

    Each name holds just one hash ID. For a branch name, that one hash ID is, by definition, the last commit on the branch. So when we have:

    ...--F--G--H   <-- somebranch
    

    this means that commit H is, by definition, the last commit on the branch. That's how we can move the name to drop a commit, or several commits, off the end:

           G--H
          /
    ...--F   <-- somebranch
    

    Now that somebranch points to F instead of H, F is automatically the last commit on the branch.

    Whenever we make a new commit, we do this with:

    1. git switch branch or git checkout branch;
    2. work on / with the files Git copied out of the commit the branch name selected;
    3. use git add (for reasons I won't go into here); and
    4. run git commit.

    This last step—the git commit step—makes the new commit by:

    • gathering the appropriate metadata: it gets your name and email address from user.name and user.email, for instance;
    • figuring out the hash ID of the current commit, using the current branch name from step 1: if it points to F, that's the current commit;
    • writing out the new snapshot and metadata, with the new commit's arrow pointing back to the current commit; and
    • one last trick ...

    but let's draw the effect of writing out the new commit first:

           G--H   <-- origin/feature/orphans
          /
    ...--F   <-- current-branch (HEAD), some-other-branch
          \
           I
    

    We now have this here new commit I, which got a new, unique, big ugly hash ID. Now that last trick kicks in: Git writes the new commit's hash ID into the current branch name:

           G--H   <-- origin/feature/orphans
          /
    ...--F   <-- some-other-branch
          \
           I   <-- current-branch (HEAD)
    

    This is how branches grow, in Git. We check one out, with git checkout or git switch—in my drawings here that means we attach the special name HEAD to the branch name; you can see that special name in your own git log output—and checking out the commit gets us all the saved files from the commit. Then we do our work as usual and make a new commit. The new commit gets a new unique hash ID and Git stuffs the new commit's hash ID into the current name, and now the name points to the new last commit.

    How does this help you do what you want?

    Let's draw some of what you have, replacing the big ugly hash IDs with the one-letter uppercase letter names I like, in the form I like to draw it:

    ...--G--H--I--J--K--L   <-- feature/orphans (HEAD), origin/feature/orphans
    

    Here G stands in for 2505060 - (origin/develop, develop) Merge branch 'main' into develop. H stands in for fcf0c4a - feat(CAT-172): Add new publisher: the commit you want to "remove". I stands in for 9093c8d - fix(cat-172): Change data format from object to array, a commit you want to keep. J-K-L are also commits you want to keep.

    The bad news is that you're going to have to eject the commits you wanted to keep. The good news is that you can copy them to new and improved commits first. We're going to end up with:

           H--I--J--K--L   <-- origin/feature/orphans
          /
    ...--G
          \
           I'-J'-K'-L'  <-- feature/orphans (HEAD)
    

    The new commits I'-J'-K'-L' will be carefully arranged copies of the old commits. We're going to make two changes to each copy:

    1. The parent of each copy will point back to the right parent: that is, I' will point directly to G, not to H.
    2. The snapshot files of each copy will drop the changes you made in commit H.

    Now, the clear, but manual and a bit painfully slow, way to do this is to manually copy each commit you want copied, one at a time. We would do this by creating a new temporary branch name pointing to commit G:

           H--I--J--K--L   <-- feature/orphans, origin/feature/orphans
          /
    ...--G   <-- temp-branch (HEAD)
    

    which we do with:

    git switch -c temp-branch 2505060
    

    We are now on this new temporary branch, and the files we can see and work with are those from commit G (or 2505060 to be exact).

    We now want to have Git figure out what we changed in commit I and make those same changes here and now and commit them. Git will copy the commit message from commit I too.

    The Git command that does this simple "copy one commit's changes and commit message" is git cherry-pick, so we would run:

    git cherry-pick <hash-of-I>
    

    The (abbreviated) hash of I is 9093c8d so we could type that in and press ENTER and get:

           H--I--J--K--L   <-- feature/orphans, origin/feature/orphans
          /
    ...--G
          \
           I'  <-- temp-branch (HEAD)
    

    We then have to repeat with three more git cherry-pick commands with the right hash IDs. This copies J to J', then K to K', then L to L':

           H--I--J--K--L   <-- feature/orphans, origin/feature/orphans
          /
    ...--G
          \
           I'-J'-K'-L'  <-- temp-branch (HEAD)
    

    Once we've finished all the git cherry-pick steps, we just have to tell Git: Hey Git, force the name feature/orphans to point to the current commit, which would require using git branch -f. Then we'd git switch feature/orphans to get back on it:

           H--I--J--K--L   <-- origin/feature/orphans
          /
    ...--G
          \
           I'-J'-K'-L'  <-- feature/orphans (HEAD), temp-branch
    

    and then we can delete the name temp-branch entirely since we're done with it.

    The fast way

    Doing all these separate steps—creating a new but temporary branch name, cherry picking commits one by one, forcing the old branch name into place, switching back to the old branch, and deleting the temporary branch—is a big pain in the <insert anatomy part here>. We don't have to do it like this. We have the git rebase command instead.

    The git rebase command is mainly just a fancy way of doing the above, with one command. Because this one command does so many things, it has a lot of pieces, and I think that's where you're running into issues with rebase.

    You have a lot of options here—there are many ways to run git rebase—but the one I generally use myself for this kind of case is called interactive rebase. You run it like this:

    git switch feature/orphans     # if you're not already there
    git rebase -i origin/develop
    

    The name origin/develop here is any name—branch or other name—that selects the place you want the commits to go. You can use a raw hash ID if you like (git rebase -i 2505060), but we want to pick out the commit I have been calling "commit G". This is where the copies should go.

    The git rebase command will now work out which commits to copy by listing the commits that you have now, excluding those reachable from commit G. Without going into what this all means, the short version is that this lists commits H-I-J-K-L. This is one too many commits, but that's OK! Having listed out these commit hash IDs, the -i in git rebase -i means Now that you've listed out the commits to copy, make up an instruction sheet with the word pick in front of each hash ID.

    This instruction sheet will therefore read:

    pick fcf0c4a feat(CAT-172): Add new publisher
    pick 9093c8d fix(cat-172): Change data format from object to array
    

    and so on for the remaining three commits. Now, because of the -i, git rebase opens your editor on this instruction sheet. Your job right now is to adjust these instructions and then write this out and exit your editor.1 In your particular case, your job is to change or delete the pick command for commit H—the commit you don't want. If you change this to drop or d, or if you simply delete the entire line, git rebase will not copy commit H after all.

    Once you write out the instruction sheet, git rebase will proceed to execute the remaining pick instructions, running git cherry-pick for each commit that needs to be copied. This gets you the I'-J'-K'-L' commits. Then git rebase finishes up by moving the name feature/orphans to point to the final copied commit, L':

           H--I--J--K--L   <-- origin/feature/orphans
          /
    ...--G
          \
           I'-J'-K'-L'  <-- feature/orphans (HEAD)
    

    You now have in your repository the set of commits you want, but there's one thing left to do.


    1Some editors don't really "exit": they need to communicate to Git that they're done writing the file. This is another place you can get a stumbling block. But Git already has this problem with git commit, if you don't use the -m flag, and generally you should not be using the -m flag. So you should have solved this already, if you have one of these tricky editors.


    You now need to use git push --force-with-lease

    You've sent commits H-I-J-K-L to some other Git repository, one that you call up using the name origin. You had that other Git repository create or update their branch name feature/orphans. Your own Git reflects this by remembering their feature/orphans as your origin/feature/orphans.

    You now need to send this other Git repository the I'-J'-K'-L' commits—this part is easy enough—and then convince them that they should drop their H-I-J-K-L chain in favor of your new-and-improved I'-J'-K'-L' chain of commits. This part requires using a force push.

    In general, Git really likes to add new commits to branches. It doesn't like dropping commits off the end: that's usually considered bad or wrong. So you have to force their Git to do that.

    Using git push --force-with-lease origin feature/orphans, you have your Git call up their Git, give them commits I'-J'-K'-L', and then send over a command of the form:

    I think your feature/orphans holds cf83304. If so, I command you to stuff the hash ID of commit L' in there instead. Let me know if I was right and you did that.

    They will either find the right thing and obey, or tell you why they didn't.

    You can use the simpler git push --force. This omits a bit of safety checking, by sending them the command:

    Stuff the hash ID of commit L' in your feature/orphans! Do it now! I command you!

    If for some reason they've picked up an even-newer commit that goes after L, this will drop that commit. If you don't know its hash ID, you can't ever ask for it. By using the "I think ... so do this" construct, if you're wrong you can see what the heck happened before you make them drop a commit that nobody can find again later.