Search code examples
gitgit-diff

git diff reporting differences when files are the same


I'm trying to hand merge some changes from one branch into another branch in my repository. I used the following to find the differences:

git diff branchA...branchB path/to/files > diff.patch

This worked great and showed my all of the changes. I then went and updated the files in branchA that I wanted by hand and committed them. Git status shows that everything is up to date. Now I want to run the diff again to make sure I haven't missed anything.

However when I run the diff, it shows me all of the same differences as the first time I ran it. For example it is showing me lines added in branchB even though the exact lines now exist in branchA.

I'm guessing it's because I did the merge by hand, but how do I now get the correct diff again?


Solution

  • The three-dot syntax (branchA...branchB) has a special meaning in git diff that is different from its normal meaning in most1 git commands.

    Normally, X...Y tells git rev-list to construct the symmetric difference of two sets: the first is the set of all commits reachable from X, and the second is the set of all commits reachable from Y. The symmetric difference is "all commits reachable from either X or Y, but not from both". In other words, if tracing back through the history of X and Y eventually arrives at some common commit(s)—these commits are the Lowest Common Ancestors—then we're left with only the commits that are neither LCAs nor ancestors.

    The LCAs are what we also call the merge bases. In most cases there is exactly one LCA and it is the (singular) merge base of the two commits you identify. When those two commits are the tips of two divergent branches, this makes a lot of sense, and the symmetric difference is "commits after the merge base, on both branches", which is a great starting point for operations like cherry-picking.

    It's not much good for git diff, though. The reason is that git diff compares two and only two commits.2 If you give it a big set of many commits, it has no idea what to do.

    Because git diff only works with a pair of commits, it redefines both the two-dot X..Y and three-dot X...Y syntaxes. For the two-dot version, it just takes the two named commits, but for the three-dot version it does something clever:

    git diff X...Y means git diff $(merge-base X Y) Y

    Since the three-dot syntax normally stops at the (usually single) merge base, git diff assumes that you must be intending to do something merge-ish when you give it this syntax. It therefore finds a merge base for the commits you named. With any luck, there is only the one Lowest Common Ancestor so that this is the merge base. Then it compares the merge base commit to the second of the two named commits.

    If the two names are branch-names (as they are in this case), and you check out the first branch and add one or more commits to it (as you did in this case), we can be certain that the merge bases of the two branch-tips is unchanged. The second branch is also unchanged (because we have modified only the first one, adding new commits to it). Therefore, if we now run a new git diff using the same three-dot syntax, we will compare the exact same two commits.

    To see this for yourself, use:

    git merge-base branchA branchB
    

    and note the printed commit ID, then check out branchA and add a commit and run the same git merge-base command. You might also want to run git rev-parse branchA and git rev-parse branchB before and after adding the new commit: you will see that branchA acquires a new commit while branchB does not.

    You can also run:

    git rev-list --left-right branchA...branchB
    

    which will produce the list of the symmetric difference commits (excluding the merge-base), marking them with < and > characters to show which of the two (left or right side) ancestries selected them. (Change rev-list to log --graph to see the log messages.)


    1Any git command that uses git rev-list, anyway. This includes git log (in fact, git log and git rev-list are essentially the same command, with different default outputs).

    2Well, git also does what it calls "combined diffs" for merge commits, but you can't get one from git diff itself.