Search code examples
git-diffgit

Running git diff-tree with --numstat and --name-status


I'm writing a script to analyze changes have been made into a git repo. At some point I need to iterate over all the commits and obtain these information about each of them:

  • Commit ID
  • Date
  • Commit Message
  • ...
  • Files changed
    • File Name
    • Type of change (Added/Modified/Removed/Renamed)
    • New File Name (in case the change type is "Renamed")
    • Number of lines added
    • Number of lines removed

I get the commit messages and dates by git log. The issue I have is with the files.

If I don't want to collect number of lines added/removed, I'd simply use

git diff-tree --no-commit-id --name-status -M -r abcd12345

The output would be something like

A   Readme.md
M   src/something.js
D   src/somethingelse.js
R100    tests/a/file.js tests/b/file.js

Which I can parse and read programmatically.

To get information about lines added/removed, I could use this:

git diff-tree -M -r --numstat abcd12345

The output would be like:

abcd12345
82  0   Readme.md
41  98  src/something.js
0   64  src/somethingelse.js
0   0   tests/{a => b}/file.js

Which is not that machine readable for renamed files.

My question is: Is there any way to combine these two commands? It seems I can't use --numstat with --name-status.

I can run two separate command and merge the result in my script as well. In that case, is there any other switches that I can use to make the result of the second command more machine readable?

Thanks.


Solution

  • I think your analysis (that you need two separate commands) is correct. Use -z to obtain machine-readable output with --numstat (this disables both fancy rename encoding and all special-character-quoting), but note that you will then have to break lines apart at ASCII NULs instead of newlines.