Search code examples
gitblame

Example to understand git blame -M / -C


I am working on understanding how git blame -M and git blame -C is working.

I created two files:

FileA:

A
B
C

FileB:

D
E
F

and added (and committed) them to my repository (hash 1234). Then I copied the content of FileB to FileA so that it looks like this:

FileA:

A
B
C
D
E
F

and committed the changes (hash 4567).

Then I ran git blame -C FileA.

I expected the output:

1234 A
1234 B
1234 C
1234 D
1234 E
1234 F

but instead got:

1234 A
1234 B
1234 C
4567 D
4567 E
4567 F

Same when I move block D E F to FileA and do git blame -M FileA.

Did I misinterpret the purpose of -C and -M or did I miss something when constructing the test files?

Update 1: Neither setting the value of -C and -M to 3 helped, nor did handling larger text (tried it with 3 paragraphs of lorem ipsum)


Solution

  • From git help blame:

       -C|<num>|
           In addition to -M, detect lines moved or copied from other files that were modified in the same commit. This is
           useful when you reorganize your program and move code around across files. When this option is given twice, the
           command additionally looks for copies from other files in the commit that creates the file. When this option is given
           three times, the command additionally looks for copies from other files in any commit.
    
           <num> is optional but it is the lower bound on the number of alphanumeric characters that git must detect as
           moving/copying between files for it to associate those lines with the parent commit. **And the default value is 40**. If
           there are more than one -C options given, the <num> argument of the last -C will take effect.
    

    Notice the And the default value is 40? Your example only presents a change of 6 (or maybe 9) characters, which is well below the threshhold of 40...

    I suspect your test input is not large enough for the algorithms to detect the text movement...

    Edit: there's also the bit in there about "other files that were modified in the same commit". So here's an example:

    $ git init /tmp/foo
    Initialized empty Git repository in /tmp/foo/.git/
    $ cd /tmp/foo
    $ cp /etc/motd file1
    $ cp /etc/magic file2
    $ cp /etc/os-release file3
    $ git add file1 file2 file3
    $ git commit -m baseline
    [master (root-commit) 36a1d7] baseline
     3 files changed, 19 insertions(+)
     create mode 100644 file1
     create mode 100644 file2
     create mode 100644 file3
    $ head -5 file2 >> file1
    $ head -5 file3 >> file1
    $ sed -i 1,5d file3
    $ git add file1 file3
    $ git commit -m second
    [master b7a683] second
     2 files changed, 8 insertions(+), 5 deletions(-)
    $ git log --pretty=oneline
     b7a683 (HEAD, master) second
     36a1d7 baseline
    $ git blame file1
     ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  1) 
     ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  2) The programs included with the Debian GNU/Linux system are free software;
     ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  3) the exact distribution terms for each program are described in the
     ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  4) individual files in /usr/share/doc/*/copyright.
     ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  5) 
     ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  6) Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
     ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  7) permitted by applicable law.
     b7a6839 (Joe User 2015-03-26 17:21:41 -0500  8) # Magic local data for file(1) command.
     b7a6839 (Joe User 2015-03-26 17:21:41 -0500  9) # Insert here your local magic data. Format is described in magic(5).
     b7a6839 (Joe User 2015-03-26 17:21:41 -0500 10) 
     b7a6839 (Joe User 2015-03-26 17:21:41 -0500 11) PRETTY_NAME="Debian GNU/Linux 7 (wheezy)"
     b7a6839 (Joe User 2015-03-26 17:21:41 -0500 12) NAME="Debian GNU/Linux"
     b7a6839 (Joe User 2015-03-26 17:21:41 -0500 13) VERSION_ID="7"
     b7a6839 (Joe User 2015-03-26 17:21:41 -0500 14) VERSION="7 (wheezy)"
     b7a6839 (Joe User 2015-03-26 17:21:41 -0500 15) ID=debian
    $ git blame -C file1
     ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  1) 
     ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  2) The programs included with the Debian GNU/Linux system are free software;
     ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  3) the exact distribution terms for each program are described in the
     ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  4) individual files in /usr/share/doc/*/copyright.
     ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  5) 
     ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  6) Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
     ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  7) permitted by applicable law.
     b7a6839 file1 (Joe User 2015-03-26 17:21:41 -0500  8) # Magic local data for file(1) command.
     b7a6839 file1 (Joe User 2015-03-26 17:21:41 -0500  9) # Insert here your local magic data. Format is described in magic(5).
     b7a6839 file1 (Joe User 2015-03-26 17:21:41 -0500 10) 
     ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 11) PRETTY_NAME="Debian GNU/Linux 7 (wheezy)"
     ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 12) NAME="Debian GNU/Linux"
     ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 13) VERSION_ID="7"
     ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 14) VERSION="7 (wheezy)"
     ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 15) ID=debian
    

    Note that, without -C, git blame just attributes the new lines to the second commit. But with it, it attributes the last 5 lines to file3 from the first commit, because 1) that's where they came from, 2) the segment is large enough, and 3) file3 was also modified in the second commit. The lines from file2 are not recognized, because, while the segment is large enough, file2 was not modified in the second commit.

    Also, note the difference between -M, which detects content movement within a file, and -C which detects movement/copying between different files.