Search code examples
gitgit-filter-branchgit-history-graph

Remove useless merges (those without any 'non-mainline' commits) after filter-branch


I've performed a git filter-branch --index-filter 'git rm --cached --ignore-unmatched badfiles/ badfiles2/' --prune-empty (per here) to remove a bunch of files in preparation for moving the remaining files to another repository. --prune-empty gets rid of any resulting empty-commits, but it doesn't act on merges, which makes sense.

Now the history for this particular repo looks pretty ugly with a bunch of merges that don't actually add anything and some merges that are just merges of other merges that didn't actually add any changes (in the rewritten history; they may have been 'useful' before the filter-branch).

Consider this annotated snippet (generated with git log --graph --oneline --shortstat):

*   575e3b5 Merge pull request #68 from chris/feature # KEEP THIS MERGE!
|\  
| * 5dbc3f1 Actual feature changes
| |  2 files changed, 2 insertions(+), 2 deletions(-)
| * 35abc98 Cleanup/prep
|/  
|    2 files changed, 22 insertions(+), 16 deletions(-)
*   c3b3d86 Merge pull request #46 from org/topic_branch-mods # USELESS-C
|\  
* \   892de05 Merge pull request #47 from org/topic_branch # USELESS-B
|\ \  
| |/  
|/|   
| *   e738d4b Merge branch 'master' into topic_branch # USELESS-A
| |\  
| |/  
|/|   
* | 4182dac CommitMsg #40 #SQUASHED-PR
| |  2 files changed, 15 insertions(+), 6 deletions(-)
* | 3b42762 CommitMsg
|/  
|    2 files changed, 29 insertions(+), 14 deletions(-)
* c4e62ba CommitMsg
|  2 files changed, 39 insertions(+), 16 deletions(-)
* c2bb13f CommitMsg
   4 files changed, 241 insertions(+)

I'd like to shorten this to (obviously with different id's as appropriate):

*   575e3b5 Merge pull request #68 from chris/feature # KEEP THIS MERGE!
|\  
| * 5dbc3f1 Actual feature changes
| |  2 files changed, 2 insertions(+), 2 deletions(-)
| * 35abc98 Cleanup/prep
|/  
|    2 files changed, 22 insertions(+), 16 deletions(-) 
* 4182dac CommitMsg #40 #SQUASHED-PR
|  2 files changed, 15 insertions(+), 6 deletions(-)
* 3b42762 CommitMsg
|  2 files changed, 29 insertions(+), 14 deletions(-)
* c4e62ba CommitMsg
|  2 files changed, 39 insertions(+), 16 deletions(-)
* c2bb13f CommitMsg
   4 files changed, 241 insertions(+)

So I'd like to get rid of the 'USELESS' merges, which are all 'empty' merges (no merge changes), but I'd like to preserve the history/grouping associated with the also-'empty' KEEP merge at the top, which groups those commits together into one 'changeset'.

Or looking at another example in the traditional simplified-sideways-history:

A -- B -- C -- D   ==>  A -- B --- D'
 \----\--/   /                \-E-/
       \----E 

I have tried solutions to remove 'empty' merges (like this), but those remove all empty merges, and I want to keep the 'useful' empty merges as displayed in the examples...

As far as I can tell, the 'useless' empty merges don't contain any commits that aren't all the way to the left/top in the history. Is there a way to filter those out cleanly? I guess I don't really even know how to describe/define those...

Note that the given example was intentionally simple. For what it's worth, later in the history this repo looks like this, all of which I'd like to prune:

*   3d37e42 Merge pull request #239 from jim/topic-dev
|\  
| *   05eaf9e Merge pull request #7 from org/master
| |\  
| |/  
|/|  
* |   1576482 Merge pull request #193 from john/master
|\ \  
| * \   187100e Merge branch 'master' of github.com:org/repo into master
| |\ \  
| * \ \   067cc55 Merge branch 'master' of github.com:org/repo into master
| |\ \ \  
| * \ \ \   a69e3d2 Merge branch 'master' of github.com:org/repo into master
| |\ \ \ \  
| | |/ / /  
* | | | |   0ce6813 Merge pull request #212 from jim/feature
|\ \ \ \ \  
| | |_|_|/  
| |/| | |   
| * | | |   0f5352e Merge pull request #5 from org/master
| |\ \ \ \  
| |/ / / /  

Solution

  • OK, I don't think this is perfect, but it does solve the problem in this particular case; there are cases where it doesn't quite clean up as much as it perhaps could, but it's a step if anyone is interested:

    git filter-branch --commit-filter '
    if ! git rev-parse --verify "$GIT_COMMIT^2" 1>/dev/null 2>&1 ||
      [ "$(git log --no-merges "$GIT_COMMIT^2" "^$GIT_COMMIT^1" --oneline | wc -l)" -gt 0 ];
    then
      #echo take $GIT_COMMIT >&2
      # Pick one:
      git_commit_non_empty_tree "$@" # Drop empty commits
      #git commit-tree "$@" # Keep empty commits
    else
      #echo "breakup $GIT_COMMIT ($*)" >&2
      skip_commit "$1" "$2" "$3" # (quietly) only keep the first parent
    fi' -f HEAD
    

    If 1) the commit doesn't have a second parent (git rev-parse returns an error if the referenced commit ($GIT_COMMIT^2) doesn't exist) OR 2) the second parent ($GIT_COMMIT^2) contains commits that the first parent ($GIT_COMMIT^1) does not (see here), the commit is kept (if it is not-empty; use git commit-tree if you want to keep empties); if the second parent exists and doesn't add anything useful, we skip the commit, and intentionally only pass the first parent-I'm not sure this is 'legit', but it drops the second parent from the history, and it worked in my case... (see caveats below)

    From the bottom-up:

    *   575e3b5 Merge pull request #68 from chris/feature # KEEP THIS MERGE!
    |\  
    | * 5dbc3f1 Actual feature changes
    | |  2 files changed, 2 insertions(+), 2 deletions(-)
    | * 35abc98 Cleanup/prep
    |/  
    |    2 files changed, 22 insertions(+), 16 deletions(-)
    *   c3b3d86 Merge pull request #46 from org/topic_branch-mods # USELESS-C
    |\  
    * \   892de05 Merge pull request #47 from org/topic_branch # USELESS-B
    |\ \  
    | |/  
    |/|   
    | *   e738d4b Merge branch 'master' into topic_branch # USELESS-A
    | |\  
    | |/  
    |/|   
    * | 4182dac CommitMsg #40 #SQUASHED-PR
    | |  2 files changed, 15 insertions(+), 6 deletions(-)
    * | 3b42762 CommitMsg
    |/  
    |    2 files changed, 29 insertions(+), 14 deletions(-)
    * c4e62ba CommitMsg
    |  2 files changed, 39 insertions(+), 16 deletions(-)
    * c2bb13f CommitMsg
       4 files changed, 241 insertions(+)
    

    It kept everything through SQUASHED-PR (note that commit id 4182dac and parents are retained as their history didn't change). It decided USELESS-A should stick around b/c it's second parent (4182dac) contains commits its first parent (c4e62ba) did not contain, but then it looked at USELESS-B, whose second parent (including USELESS-A) doesn't add anything useful, so it dropped it (again, including USELESS-A). Then USELESS-C was just useless, so it got dropped, and KEEP had 'something useful' in the second parent, so it was retained. So you end with:

    *   63b4d39 Merge pull request #68 from chris/feature # KEEP THIS MERGE!
    |\  
    | * 9a5570d Actual feature changes
    | |  2 files changed, 2 insertions(+), 2 deletions(-)
    | * a251317 Cleanup/prep
    |/  
    |    2 files changed, 22 insertions(+), 16 deletions(-) 
    * 4182dac CommitMsg #40 #SQUASHED-PR
    |  2 files changed, 15 insertions(+), 6 deletions(-)
    * 3b42762 CommitMsg
    |  2 files changed, 29 insertions(+), 14 deletions(-)
    * c4e62ba CommitMsg
    |  2 files changed, 39 insertions(+), 16 deletions(-)
    * c2bb13f CommitMsg
       4 files changed, 241 insertions(+)
    

    Important Caveats

    • This only works for simple histories where there are only ever two branches as we're explicitly passing "$1" "$2" "$3" in this case leaving off "$4" "$5", which would otherwise be included in "$@". If you have multiple parents (or rather if your commit has multiple parents), you'll have to adjust this to account for that; shouldn't be too hard, but I'm not fixing it right now for a hypothetical - you may want to choose specific parents to drop, idk.
    • If there were a 'useful' commit after USELESS-A before it got merged to USELESS-B (which arguably wouldn't be useless then), USELESS-A will not get pruned/dropped, so you'll still have some ugliness perhaps.
    • There are likely other scenarios where this doesn't work or could be improved. Please add suggestions in the comments (as usual) if you find any!