Search code examples
gitgit-filter-branchgit-rewrite-history

Git tree-filter discards changes again in consecutive commits


We plan on enforcing a clang-format based style in a source repository. We anticipate some difficulties, which is why we want to provide a make target to perform re-formatting for the current branch from its merge base with master to the branch HEAD.

As a simplified example, consider the following command:

git filter-branch -f --tree-filter '
  AFFECTED_FILES=$(git diff-index --diff-filter=AM --name-only $GIT_COMMIT^);
  echo; echo AFFECTED $AFFECTED_FILES;
  for f in $AFFECTED_FILES; do
    echo formatting $f;
    echo foo >> $f;
  done
' HEAD~10..HEAD

We run a tree-filter on a number of commits (we simply limit this to the last few commits, this already demonstrates the problem). We determine the affected files (we only want to touch files added or modified in the commit). For simplicity (the error is easier to spot), we do not use clang-format here, but simply append "foo" to each of these affected files (replacing echo foo >> $f with clang-format -i $f is all that is needed to get the actual code).

It does properly apply the changes we intend. However, in each but the first commit, it discards the changes we made previously. Looking through the commits, assume in a file some.txt you see "+foo" in the diff. In the child commit, for some.txt you see "-foo" in the diff, even if some.txt was not modified in the child commit at all, but only someother.txt. I have run this on arbitrary test repos, showing the same behavior.

I have also tried the following (coming back to actual clang-format):

git filter-branch -f --tree-filter 'git clang-format --extensions cpp,h' -- HEAD~10..HEAD

While most commits do look correct, the first one will modify all files touched by any commit in the given range. I want to avoid this and only format the files touched by a commit anyway.

What am I missing to avoid undoing the changes in child commits? Do I need to update the index in some way?


Solution

  • A tree filter in git filter-branch looks at the state of the files at each commit, but changing those files in one commit has no effect on the state of the files in the next commit that the tree filter looks at. This means that if you make some changes to only one commit in a git filter-branch invocation, then those changes will not be propogated on to the children of that commit. This means that the tree of the those children will be unchanged when compared to the pre-rewritten commit and will, therefore, appear to undo the custom changes introduced in their rewritten parent.

    To achieve what you want, you will probably want to consider a different set of AFFECTED_FILES, such doing a diff against HEAD~10 instead of just the parent commit to make sure that any file that was previously rewritten still gets reformatted. (Note that this isn't perfect, because if a file gets reverted to the exact state that it was at in HEAD~10 then it will start being omitted from reformatting again, but this may be an edge case which is rare enough that it isn't worth coding around - or you could include diffs against all parents and the base of the filter-branch operation.)