I want to get line of code for each revision, but I think it is too time-consuming by git checkout
each revision and run wc -l
. So I get the line of code for the first revision, then get the number of line added and deleted between each revision and one of its first parent. In this way, the line of code for one revision should be line of code for its parent revision plus number of added line between them minus number of deleted line between them. However, I found this formula doesn't holds for some merge revision. Anyone can give me any hints?
Take the merge commit a4632dad6cf5ecdbcd8e4f357c69f3b34afc04f0
of dubbo for example. The line of code for a4632dad
is 155784 which is got by running command git ls-files -- "*.java"| xargs cat | wc -l
. One of its parent is 4f3017c7
whose line of code is 175829. Then I got the line changed between them by git diff --shortstat 4f3017c7 a4632dad -- "*.java"
. The result is 226 files changed, 3174 insertions(+), 23239 deletions(-)
. 175829+3174-23239=155764 does not equal to 155784.
A merge commit is just a regular commit that happens to have more than one parent.
In the case you mention, a4632dad6cf5
is a merge commit. Its first parent is 5b0ab1143b25
, and its second parent is 4f3017c71849
:
$ git log -1 --format=fuller a4632dad6cf5
commit a4632dad6cf5ecdbcd8e4f357c69f3b34afc04f0 (HEAD)
Merge: 5b0ab114 4f3017c7
Author: [...]
AuthorDate: Thu Jan 25 14:01:50 2018 +0800
Commit: [...]
CommitDate: Thu Jan 25 14:01:50 2018 +0800
Merge branch '2.5.x'
[...]
If this were a regular commit, it would simply not have the second parent 4f3017c71849
.
To compare the state before and after this merge from the perspective of the branch the merge happened on, diff the first parent 5b0ab1143b25
against the merge a4632dad6cf5
:
$ git diff --shortstat 5b0ab1143b25..a4632dad6cf5 -- '*.java'
65 files changed, 976 insertions(+), 323 deletions(-)
Compare this to:
$ git checkout 5b0ab1143b25 2>/dev/null
$ git ls-files -- "*.java" | xargs cat | wc -l
155131
$ git checkout a4632dad6cf5 2>/dev/null
$ git ls-files -- "*.java" | xargs cat | wc -l
155784
The difference in file count between the two is 653
:
155784 - 155131 = 653
The sum of the additions and removals between 5b0ab1143b25
and a4632dad6cf5
is also 653
:
976 - 323 = 653
Here is one way to count lines without checkout (it'll work in a bare repository), but perhaps checking out each commit is faster still. This under the somewhat naïve assumption that everything that ends in .java
is a blob
object:
$ git ls-tree -r 5b0ab1143b25 | grep '\.java$' | awk -F' ' '{print $3}' | xargs -n1 git cat-file blob | wc -l
155131
$ git ls-tree -r a4632dad6cf5 | grep '\.java$' | awk -F' ' '{print $3}' | xargs -n1 git cat-file blob | wc -l
155784