Search code examples
regexgitdiffword-diff

Git word diff – how not to break apart certain words?


I need to apply a git word diff that treats every character as a word except when the characters are surrounded by square brackets. In that case, I want everything in the square brackets (even whitespace), including the brackets themselves and what they replace to be considered whole words so that none of these things are ever broken up.

Per the docs, "--word-diff-regex=. will treat each character as a word [...]". So far so good, but I haven't been able to combine this with my other requirement of never breaking up brackets etc.

Here's a screenshot of the problem:

enter image description here

Since the letter 'e' appears both in the words 'The' and 'future', it's being reused and not marked as a change. I believe this is a result of using the regular expression .. At the end, however, where only the letter 's' in 'possibilities' is highlighted as opposed to the entire word being replaced, that regular expression does the desired job.

The whole thing should look like this:

enter image description here

Here, both the brackets and what they replace are being treated as cohesive units, and the word 'possibilities' at the end is still not, which is perfect.

I have played with various regular expressions, including using XOR, to no avail. My guess is that, in many of my attempts, . 'swallows' whatever limiting expression I combine it with. I have gotten the specific example above to work with [a-zA-Z0-9\[]]|\s|\\n|[^[:space:]], but I don't know why (regex noob) and it doesn't work in other scenarios.

Keep in mind that git uses "POSIX “extended” regular expressions and not PCRE or Perl Compatible Regular Expressions, e.g. [:space:] instead of \s for the whitespace character class."


Solution

  • $ sh -x <<\EOD; rm -rf test
    git init test; cd $_
    echo The possibilities > test; git add .; git commit -m-
    sed -i 's,The,[Our future],' test; git commit -am-
    git show --oneline --word-diff-regex=.
    git show --oneline --word-diff-regex='\[[^]]*\]?|.'
    EOD
    + git init test
    Initialized empty Git repository in /home/jthill/sandbox/test/test/.git/
    + cd test
    + echo The possibilities
    + git add .
    + git commit -m-
    [master (root-commit) 06604d0] -
     1 file changed, 1 insertion(+)
     create mode 100644 test
    + sed -i 's,The,[Our future],' test
    + git commit -am-
    [master 6728cd4] -
     1 file changed, 1 insertion(+), 1 deletion(-)
    + git show --oneline --word-diff-regex=.
    6728cd4 (HEAD -> master) -
    diff --git a/test b/test
    index c6eb735..1187549 100644
    --- a/test
    +++ b/test
    @@ -1 +1 @@
    [-Th-]{+[Our futur+}e{+]+} possibilities
    + git show --oneline '--word-diff-regex=\[[^]]*\]?|.'
    6728cd4 (HEAD -> master) -
    diff --git a/test b/test
    index c6eb735..1187549 100644
    --- a/test
    +++ b/test
    @@ -1 +1 @@
    [-The-]{+[Our future]+} possibilities