Search code examples
gitgit-squash

Git: remove some commits from a branch


Let's say I have, in a given working branch, a history of commits that looks like the below:

commithash1 [task-a] lorem ipsum
commithash2 [task-a] dolor sit amet
commithash3 [task-b] consectetur adipiscing
commithash4 [task-c] elit se do
commithash5 [task-a] eiusmod tempor

It's like that because someone merged this working branch with the contents of Develop branch and inserted some new commits after that.

The idea is to keep on it only the commits labeled as "[task-a]", that is, commit hashes 1, 2, and 5 only, removing the other ones. I need to group them, so I can squash these commits and rebase from develop right after.

Is that doable? If so, how can I accomplish that?


Solution

  • There is an answer(ish) in a comment (at least right now) but it has a typo in it. Matt's comment has a useful warning as well: it's generally unwise to rebase commits that someone else is using.

    Note, too, that if the parent of commit-hash-5 is commit-hash-4, what you'll end up with is a new and improved commit with commit-hash-6, whose parent is commit-hash-2. The old commit-hash-5 commit continues to exist. Your pseudo-git log --oneline output is missing key --graph data so I cannot be sure what's going on here. But we can say some things generally about this.

    In particular, a branch name simply hold the raw hash ID of some commit. Whatever hash ID is stored in the branch name, we call that the tip commit of the branch. The output from git log --oneline is normally shown in reverse: that is, rather than:

    commithash1 [task-a] lorem ipsum
    commithash2 [task-a] dolor sit amet
    commithash3 [task-b] consectetur adipiscing
    commithash4 [task-c] elit se do
    commithash5 [task-a] eiusmod tempor
    

    we'd see:

    commit5 (branchname) [task-a] eiusmod tempor
    commit4 [task-c] elit se do
    commit3 [task-b] consectetur adipiscing
    commit2 [task-a] dolor sit amet
    commit1 [task-a] lorem ipsum
    

    (and with --graph we'd see if there are any branch-and-merge sequences in there as well). Running git rebase -i HEAD~5 (not HEAD^5, which is the typo) would bring up a rebase command sheet with pick commands for each of the commit hashes, and in the edit sheet they're shown in normal-human-person order, i.e., forwards, rather than in backwards Git order. But Git actually finds them backwards: rebase has to fix that, and git log doesn't have to fix that and normally doesn't bother.

    What I like to do here is draw the commits, using single uppercase letters to stand in for each commit. There's some commit "before" hash #1; let's call that commit B, and assign letters C-D-E-F-G to the remaining five commits and draw those in, like this:

    ...--B--C--D--E--F--G   <-- branchname
    

    We say that the name branchname points to commit G, here. Commit G stores a full snapshot of all files, plus some metadata, and the metadata in commit G stores the hash ID of earlier commit F. So we say that commit G points to commit F. Commit F, being a commit, also stores a snapshot and metadata, and its stored hash ID is that of earlier commit E, i.e., [task-b] consectetur adipiscing. Of course commit E is a commit, so it points backwards to commit D, and so on.

    There are two more constraints we have here:

    • A branch name can only point to one commit. We can pick any commit, or make any new commits that we like, but we only get to point it to one commit.

    • No commit can ever change. (Not even git commit --amend actually changes a commit: it fakes it; the --amend is a lie, albeit a useful lie.)

    Since commits C-D are fine as is, we can leave them alone. The problem starts at commit E, and becomes significant at commit G, because G points backwards to F, and that's the intolerable error.

    To fix that error, we must copy G to a new and improved G'. There will be two key differences between commit G and the new-and-improved G':

    • G''s parent will be D, not F;
    • G''s snapshot will be D's plus whatever change happened between F and G.

    To achieve this, we'll essentially check out commit D and run git cherry-pick to copy G, giving:

    ...--B--C--D--E--F--G
                \
                 G'
    

    as the overall set of commits in the repository. This leaves us with two more problems:

    • We now have to make the name branchname point to G'. We can do this with git branch -f (if we're not "on" the branch at the time) or git reset --hard (if we are "on" the branch at the time).

    • When we move branchname, we'll lose the ability to find commit G, unless we save its hash ID somewhere. As long as commit G is "trash" or "junk" or "rubbish" or whatever, to be swept up and disposed of later, that's fine. But commit G is how we find commit F aka [task-c] elit se do. So we need to create some name to remember the hash ID of F.

    The way to do that is:

    • create a new name now, to remember F; then
    • use git rebase (with or without -i) to create G' and move branchname.

    One sequence of commands to do this is, assuming we're on branch branchname right now:

    git branch save-them HEAD~1      # create save-them to remember F
    git rebase --onto HEAD~3 HEAD~1  # copy G to G'
    

    The --onto argument's value HEAD~3 tells git rebase where to put the copy. The HEAD~1 argument tells git rebase what not to copy: i.e., copy commit G, but not commit F or anything earlier. (Rebase copies the commits that are not reachable from its non---onto argument, but are reachable from HEAD, so that's what git log HEAD~1..HEAD would show, which is commit G if we're on branch branchname when we start all this.)

    We can draw the final result this way:

                       G   [abandoned]
                      /
    ...--B--C--D--E--F   <-- save-them
                \
                 G'  <-- branchname (HEAD)
    

    Note that commit G still exists. It's just that we've forgotten how to find it. We won't see it in git log output. If that particular commit exists in any other clone, it still exists in that other clone, and that could be a problem later, but if we're the only Git repository in the entire universe that holds commit G, the fact that we have (deliberately) forgotten how to find it means nobody will find it any more.