Search code examples
perltextdiffgit-diff

Getting a diff to properly recognize code blocks that contain similar content


I am diffing files that contain code blocks that have similar contents. The issue is this can cause diffs to get confused. I will start with an example, because this is hard to explain with words.

file1.txt:

text
(
    contents
)
block "block1"
(
    contents
)
block "block2"
(
    contents
)

file2.txt:

block "block1"
(
    contents
)
text
(
    contents
)
block "block2"
(
    contents
)

When I diff these two files i get the following output:

-text
+block "block1"
 (
     contents
 )
-block "block1"
+text
 (
     contents
 )
 block "block2"
 (
     contents
 )

The issue is, the diff program doesn't recognize that code blocks of type "block" are entirely independent from code blocks of type "text" and should be treated as separate entities. (Perl's Text::Diff in this case, but I also have git-diff available and it does the same thing.)

How can I make a diff recognize these different types of code blocks as separate entities so a diff of these two files would produce the following results instead?

-text
-(
-    contents
-)
 block "block1"
 (
     contents
 )
+text
+(
+    contents
+)
 block "block2"
 (
     contents
 )

Note that this is a drastically simplified example compared to the code I am actually trying to diff, I understand that it is easy enough to figure out what this example is doing, but when you are dealing with hundreds of similar elements the diff output becomes completely unreadable.

I want the diff to realize that only a "text" code block was edited in this modification and no "block" code blocks were touched.


Solution

  • If you're ok with using git directly, try git diff --patience

    $ git diff --patience
    diff --git a/foo1.txt b/foo1.txt
    index b474449..30a91bb 100644
    --- a/foo1.txt
    +++ b/foo1.txt
    @@ -1,11 +1,11 @@
    -text
    -(
    -    contents
    -)
     block "block1"
     (
         contents
     )
    +text
    +(
    +    contents
    +)
     block "block2"
     (
         contents