Search code examples
gitwhitespacegit-diffgit-log

git diff --check shows changes of unrelated branch/files


Background

I am trying to see the whitespace errors of the current branch (ignoring CR at eol). Most files use CRLF, and I don't have the core.whitespace config set.

This is the original command:

git -c core.whitespace=trailing-space,cr-at-eol diff --check master..HEAD

HEAD refers to a branch created on top of an older version of master ("oldmaster").

The problem is that git diff --check is behaving in an unexpected way: it is showing not only the errors in master..HEAD, but also the errors in oldmaster..master.

Questions

  • Is this occurring because git diff --check compares the whole snapshots in the given revision range?

  • Why do git log and git diff behave differently in this case?

  • Shouldn't git diff --check compare only the changed lines in the changed files?

Information

master vs oldmaster (the numbers are coincidental):

$ git log --oneline oldmaster..master | wc -l
115

$ git diff --name-only oldmaster..master | wc -l
115

This shows the relevant commits correctly:

$ git log --oneline master..HEAD | wc -l
4

This shows the correct files:

$ git log --oneline --name-only master..HEAD -- | grep -E '^[a-zA-Z]+/' \
  | sort -u | wc -l
4

These for some reason also include the files changed in oldmaster..master:

$ git diff --name-only master..HEAD -- | wc -l
119

$ git -c core.whitespace=trailing-space,cr-at-eol diff --name-only \
  master..HEAD -- | wc -l
119

Both of these also show unrelated files:

$ git diff --check master..HEAD -- | grep -E '^[a-zA-Z]+/' | cut -d : -f 1 \
  | sort -u | wc -l
30

$ git -c core.whitespace=trailing-space,cr-at-eol diff --check master..HEAD \
  -- | grep -E '^[a-zA-Z]+/' | cut -d : -f 1 | sort -u | wc -l
9

Solution

  • Except for combined diffs (which you are not using here), git diff strictly compares two snapshots.1 The syntax used—git diff A B vs git diff A..B—is irrelevant here. Git extracts snapshot A, extracts snapshot B, and compares those two snapshots. Any flag options you use, such as --check, are applied to this particular comparison. Commit A need not be an ancestor of B, nor vice versa; Git does not look at any of the commits "between" these two commits; it simply extracts A, then extracts B, and diffs those.2

    The git log command does something very different: it walks the revision graph. Given git log A..B, Git finds all commits reachable from B that are not reachable from A. For a clear definition of reachability, see Think Like (a) Git.

    Note that when using -p with git log to view commits as patches, git log compares each commit to its (single) parent. If there are three commits in the A..B range, for instance, git log -p A..B first shows B and runs git diff B^ B, then shows B^ and runs git diff B^^ B^, and last shows B^^ and runs git diff B^^^ B^^. (This assumes there are no merge commits in the range, but git log omits patches for merge commits by default anyway.)


    1To see a combined diff, use git show on a merge commit. The git diff command will also produce combined diffs with some particular arguments, sometimes incorrectly: in particular the three-dot syntax, git diff A...B, is meant to compare the merge base of A and B to commit B, but sometimes does something different. Also, when you are using the index and the index contains a conflicted merge, plain git diff will produce combined diffs.

    2Technically, it does not even have to extract the two snapshots—it just works directly from their tree objects. It does have to extract differing blobs, in order to compute the difference. For identical blobs, git diff knows they are identical because their hash IDs match. But it's easier to reason about this as "extract and compare".