Search code examples
gitmergerebasemagit

Squashing old git commits that were before a merge


I'm cleaning up a git repo to make it easier to understand. Until now, it's been private, so I'm ok changing history.

Mostly, I've been squashing commits together into meaningful sets.

The problem is that the project, early in its history, was a merge of two other projects. I'm having trouble squashing commits that came before the merge.

My question is: how do I do this?

To be concrete: I have

xxxxxx * master: Latest commit
xxxxxx * another commit
xxxxxx *   Merge projectA and projectB
xxxxxx |\
0000A6 | * a minor commit in project A
0000A5 | * another minor commit

And I would like to squash 0000A5 and 0000A6 together.

When I try an interactive rebase, magit (the emacs front-end to git that I happen to be using) warns me "Proceed despite merge in rebase range?" and when I continue, it fails the following (taken from my actual work, not the simplified example above). I'm not sure if there is a problem with rebasing commits from before the merge, or something else (the "untracked files" line is suspicious since the merge had consisted of moving the client project into a "client/" subdirectory of a larger project.

Last commands done (5 commands done):
   pick fb5de84 Simplified schema:
   squash 96ac7ac Revert schema to full complexity
Next commands to do (29 remaining commands):
   pick 4ad389a Just indentation, for readability
   pick 4241835 First pass at schema
You are currently editing a commit while rebasing branch 'master' on 'ad91ab4'.

Untracked files:
    client/

No changes
You asked to amend the most recent commit, but doing so would make
it empty. You can repeat your command with --allow-empty, or you can
remove the commit entirely with "git reset HEAD^".

Could not apply 96ac7ac7753c03e83b8c1296d892ce9c5fea44c7... Revert schema to full complexity

Solution

  • It seems there are a couple things going on here. I'm not familiar with the front-end you're using, but it looks like it's trying to automate the rebase and bailing out if there's a pause that would normally require manual intervention but that it doesn't know how to fix.

    While rebasing through a merge can be tricky, I don't think that has anything to do with the current problem. It looks to me like 96ac7ac is just a revert of fb5de84, so when you squash them together you get an empty commit. This is permissible, but requires manual intervention. (The rebase would stop, you'd say something like git commit --allow-empty and then proceed.)

    You could confirm whether 96ac7ac really is a perfect revert by doing

    git diff 96ac7ac fb5de84^
    

    If it is, and if your front-end can't accommodate the required intervention for the rebase, then you could drop these two commits instead of squashing them.

    The next thing you might run into is that rebase will try to make the history linear unless you provide the --preserve-merges option. I think that's what your front-end is warning you about, and once you tell it to proceed I don't know if it's passing that option or not.

    Even with the correct options, the problem with rebasing through a merge is that any work from the original merge beyond default automatic resolution can be lost. If manual conflict resolution is needed, then again git will stop and expect you to perform the resolution (which your front-end might not cooperate with). Also, in cases where auto-resolution would succeed and yet the merge contains manually-applied changes (thankfully rare but not impossible), the changes can be silently lost.

    Another option is "work around" merges (if there aren't too many of them). If you have

    X --- A --- B --- M --- C <--(master)
      \             /
       D --- E --- F
    

    and you want to squash E and F, then you could first

    git checkout F
    git checkout -b temp_branch
    git rebase -i D
    

    and set up the squash, giving you

    X --- A --- B --- M --- C <--(master)
      \             /
       D --- E --- F
        \
         EF <--(temp_branch)
    

    Then redo the merge

    git checkout B
    git merge temp_branch
    

    During this merge, you'd have to reproduce any work that was done in the original merge M. If M simply auto-resolved, then the mew merge (M') should as well. You can confirm that it's good with

    git diff M M`
    

    If this shows differences, you'll have to manually apply them (which is probably what happened with M as well). You could get the work tree looking right and commit with --amend I believe.

    Of course if the merge signals conflicts you'll have to resolve them; again diffing against M should provide good guidance.

    When this is done you have

             <*>- M --- C <--(master)
                 /
    X --- A --- B --- M' <--((HEAD))
      \              /     
       D --------- EF <--(temp_branch)
        \
         E --- F -<*>
    

    (Sorry about the weird notation, the graph got crazy on me; the <*>s should be a single line from F to M. But don't worry, we're about to untangle it.)

    Clean up and do a final rebase of the post-merge commits:

    git branch --delete temp_branch
    git rebase --onto M` M master
    

    yielding

    X --- A --- B --- M' --- C` <--(master)
      \              /     
       D --------- EF
    

    You can see that this can be quite tedious, especially if there are many merges and/or merge(s) containing manual conflict resolutions or other non-automatic changes. Validating the final state is key. (You can use the reflog to help with this, or tag the original master HEAD before starting all of this. For example using the reflog:

    git diff master master@{1}
    

    This notation can get messy if you want to also validate each historical commit...)

    Yet another variation would be, instead of redoing the merge and rebasing post-merge work onto it, you could squash E and F and then set up a git filter-branch with --parent-filter to re-parent M from B , F to B , EF. See the git filter-branch documentation for details on how to use the --parent-filter (though you'll have to tweak the example a bit to deal with a merge). This could work because you specifically want the result to have an unchanged tree.