Search code examples
pythonweb-scrapingcomparedifflib

Compare two files and return only the changes from the second


Good morning everyone, I'm trying to compare two .txt files and only return the comparison result in the second .txt.

Is it possible to do this? I leave the code below

Thank you so much

For example I have the file1 with this:

The Last Late Night (2023) [4k 2160p][Esp]
Rabos The Musical (2023) [4k 2160p][Esp]

And I have file2 with this:

Fancy Dance (2023) [4k 2160p][Esp]
The Last Late Night (2023) [4k 2160p][Esp]

When I compare them it returns:

different
--- file.txt
+++ file1.txt
@@ -0.0 +1 @@
+Fancy Dance (2023) [4k 2160p][Esp]

@@ -2 +2.0 @@
-Rabos The Musical (2023) [4k 2160p][Esp]

And the only thing I would like him to give me back is: different +Fancy Dance (2023) [4k 2160p][Esp]

I'm not interested in knowing what has disappeared in file1, that is, what comes out with it - neither what comes out with the @@ and neither are the file names

# BOOKSTORES
import difflib
import filecmp
import sys
import requests

# WE COMPARE THE TWO FILES.
with open('file1.txt', encoding='utf8') as file_1: 
    file_1_text = file_1.readlines() 

with open('file2.txt', encoding="utf8") as file_2: 
    file_2_text = file_2.readlines() 

    iguales = filecmp.cmp('file1.txt', 'file2.txt')
if iguales:
    print("Equal")
    sys.exit(1)
else:
    print("different")  

    # WE LOOK FOR THE DIFFERENCES AND PRINT THEM.
    for line in difflib.unified_diff( 
            file_1_text, file_2_text, fromfile='file1.txt',  
            tofile='file2.txt', n=0, lineterm=''): 
        print(line) 

    def telegram_bot_sendtext(bot_message):

                bot_token = 'YOUR TOKEN'
                bot_chatID = 'YOUR CHAT ID'
                send_text = 'https://api.telegram.org/bot' + bot_token + '/sendMessage?chat_id=' +       bot_chatID + '&parse_mode=Markdown&text=' + bot_message

                response = requests.get(send_text)

                return response.json()


    test = telegram_bot_sendtext(line)

sys.exit(1)

Solution

  • There are two ways to do it.

    1. Use your own script, something like this. This will return an iterator over those lines that are only in the second file.
    from collections.abc import Iterator
    
    
    def compare(
            filename1: str,
            filename2: str,
            lineterm: str = '\n') -> Iterator[str]:
    
        def read_lines(filename: str) -> Iterator[str]:
            with open(filename) as file:
                for line in file.readlines():
                    yield line + '\n' if not line.endswith('\n') else line
    
        file1_lines = set(read_lines(filename1))
        file2_lines = set(read_lines(filename2))
        for current_line in file2_lines - file1_lines:
            yield f'+{current_line[:-1]}{lineterm}'
    
    #  '+Fancy Dance (2023) [4k 2160p][Esp]\n'
    
    1. Get the result of the difflib.unified_diff function and filter out strings that should be ignored:
    def patched_unified_diff(
           a,
           b,
           fromfile='',
           tofile='',
           fromfiledate='',
           tofiledate='',
           n=3,
           lineterm='\n') -> Iterator[str]:
    
       it = unified_diff(
           a,
           b,
           fromfile,
           tofile,
           fromfiledate,
           tofiledate,
           n,
           lineterm,
       )
       for result_line in it:
           first_three = result_line[:3]
           if first_three.startswith('+') and not first_three.startswith('+++'):
               yield result_line
    
    #  '+Fancy Dance (2023) [4k 2160p][Esp]\n'
    

    This works similarly, returning the same results. The only thing is that you should have a line break at the end of your files so that difflib.unified_diff works correctly.