git diff
has option --word-diff-regex=<...>
that matches words. There are special default values for some languages (as said in man 5 gitattributes
). But what are these? No description in docs, I looked up sources of git
, haven't found them too.
Any ideas?
EDIT: I'm on git 1.9.1
, but I'll accept answers for any version.
The sources contain the default word regexes in the userdiff.c
file. The PATTERNS
and IPATTERN
macros take the base word regex as their third parameter, and add "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+"
to make sure all non-whitespace characters that aren't part of a larger word are treated as a word by themselves, and assuming UTF-8, without splitting up multi-byte characters. For example, in:
PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$", "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),
the word regex is "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+"
.
In this case, the |[\xc0-\xff][\x80-\xbf]+
happens not to have any benefit, as everything covered by [\xc0-\xff][\x80-\xbf]+
is already covered by [a-zA-Z0-9\x80-\xff]+
, but it doesn't cause any harm either.