Search code examples
gitsvngit-filter-branchsquash

How to squash commits in branches


I am importing an old svn repo into git. At one point a folder was renamed in all branches. This was done in svn by creating a duplicate with history, followed by a delete of the original on a second commit. So I have a repo that looks like this:

A -> B -> C -> D* -> E* -> F -> G -> H
      \-> 1 -> 2* -> 3* - > 4 -/

Where D/E and 2/3 are the commits I want to squash. The reason for squashing is that while svn knows of "duplicate with history", git doesn't see this as a rename since the original files weren't removed until the next commit, and I lose history on blame at this point.

I've experimented with some rebase scripts which work, but they also flatten all my branches. The above is a seriously simplified version of what I have to do, which is why I really need scripts as I can't do it manually. There are over 1,000 branches throughout the history of the SVN repo and probably a dozen parallel branches where this change was done (all at the same time).

The git repo has not been published yet, so maintaining hashes is irrelevant. I assume I'll need to use some filter-branch script, but I'm still trying to figure out how to manage that which is what I was hoping I might get help with here. I can provide the sha1 of every commit that needs squashed and its parent.


Solution

  • You want to use a git filter-branch using --parent-filter to replace any appearance of D's SHA with C's SHA. You can also look into .git/info/grafts or git replace, which might be simpler than writing a --parent-filter and can be made permanent with a filter-branch.

    Update: As @torek says, you should definitely use git replace. To use a real-life example, here's a rename from readme.md to README.md was executed with an intermediate rename to README1.md: https://github.com/dahlbyk/posh-git/compare/dahlbyk:2b9342c...dahlbyk:57394c5. Let's call 2b9342c your C and 57394c5 your E:

    $ git tag E 57394c5
    $ git tag C 2b9342c
    $ git tag G 450d8f1
    $ git log --oneline --graph --decorate C~..G
    *   450d8f1 (tag: G) Merge pull request #320 ...
    |\  
    | * 941935c Fix a few kbd / missing markdown issues/
    | * f13dcf9 Upcase readme and have more prompt examples.
    | * 57394c5 (tag: E) Now rename to README.md.
    | * eb79ef2 Prepare to upcase README.md filename.
    * |   536c57f Merge pull request #319 ...
    |\ \  
    | |/  
    |/|   
    | * 7fafb7b Speed up Get-GitStatus
    |/  
    * 2b9342c (tag: C) Merge pull request #313 ...
    

    To pretend that the intermediate move never happened, I can replace E's parent (E~) with its grandparent (E~2 = C):

    $ git log --stat --oneline C..E
    57394c5 Now rename to README.md.
     README1.md => README.md | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    eb79ef2 Prepare to upcase README.md filename.
     readme.md => README1.md | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    $ git replace E~ C
    $ git log --stat --oneline C..E
    57394c5 Now rename to README.md.
     readme.md => README.md | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    eb79ef2 Merge pull request ...
    

    Finally, a filter-branch will make the changes permanent:

    $ git filter-branch -- ^C G E  # For demo, only rewrite G & E afer C
    $ git log --graph --oneline --decorate C~..G
    *   fcfd345 (tag: G) Merge pull request #320 ...
    |\  
    | * fa76267 Fix a few kbd / missing markdown issues/
    | * 4900687 Upcase readme and have more prompt examples.
    | * b25aa5a (tag: E) Now rename to README.md.
    * |   536c57f Merge pull request #319 ...
    |\ \  
    | |/  
    |/|   
    | * 7fafb7b Speed up Get-GitStatus
    |/  
    * 2b9342c (tag: C) Merge pull request #313 ...
    

    For your purposes, you'll do something like:

    $ git replace E~ E~2
    $ git replace 3~ 3~2
    $ git filter-branch -- ^A --all
    

    Update 2:

    The commit message I get is off of E, which I don't care about. I'd rather have D's commit message (or a script provided message).

    To keep D's commit metadata, I would suggest starting over and using a --commit-filter to specify E's tree (git cat-file -p E) for D (and that E should be skipped), e.g.

    git filter-branch --commit-filter '
      if [ "$GIT_COMMIT" = "SHA of D" ];
      then
        git commit-tree "TREE of E" -p "SHA of C";
      elif [ "$GIT_COMMIT" = "SHA of E" ];
      then
        skip_commit "$@";
      else
        git commit-tree "$@";
      fi;
      ' -- ^A E G