I need the diff (changes based on the words) between these two simple lines:
<h1>Intro line 06.03.2004</h1>
and:
<h1>Intro line 15.08.2024</h1>
The comparison command:
git diff --word-diff file1.txt file2.txt
The output:
<h1>Intro line [-06.03.2004</h1>-]{+15.08.2024</h1>+}
My problem is that it's also catching the trailing </h1>
, although this part hasn't changed. I'v also tried the cli switch --minimal
but with no success. How can I reduce the marked change to the bare minimum? I'm happy for an advice!
How can I reduce the marked change to the bare minimum?
Pay attention, minimum has a definition! You don't want that, that diff would be:
<h1>Intro line [-06-]{+15+}.0[-3-]{+8+}.20[-0-]{+2+}4</h1>
and practically unreadable.
By default, --word-diff
assumes any run of non-whitespace characters to be a single word. So, what you see is exactly what is documented in git help diff
!
What you'd need to do is specify a different --word-diff-regex
. You could simply use something like [^<> ]*
, but that will not be good enough if you then actually start changing tags.
the question here becomes for which purpose you need to do these diffs. You might be better off using something like git diff --name-only REVISION
to get the changed file names, filter them for XML content, then git --no-pager REVISION:path/to/changed/file > tmp
and a program like xmldiffs
for comparing actual XML. (if your documents are XML. If they are true HTML, which sadly isn't the friendly XML dialect one might be tempted to think – unless it's actually XHMTL, but in general, HTML 5 is not.)