python string-comparison text-comparison

Text comparison between multiple files and remove the duplicate block

I have 3 notepad files in directory , i want to compare 1st file to other 2 and drop the duplicate blocks keep unique output , for Example :

File 1:

  User enter email id {
  email id:(xyz@gamil.com)
  action:enter
  data:string }

User enter password {
passoword:(12345678)
action:enter
data:string }

 User click login {
 action:click
 data:NAN }

File 2 :

User enter email id {
email id:(xyz@gamil.com)
action:enter
data:string }

User enter password {
passoword:(12345678)
action:enter
data:string }

 User navigates another page {
 action:navigates
 data:NAN }

File 3 :

 User enter email id {
 email id:(abc@gamil.com)
 action:enter
 data:string }

 User enter password {
 passoword:(12345678)
 action:enter
 data:string }

 User submit to login {
 action:submit
 data:NAN }

I want output of file 2 and file 3 is :

File 2 :

 User navigates another page {
 action:navigates
 data:NAN }

File 3 :

 User enter email id {
 email id:(abc@gamil.com)
 action:enter
 data:string }
 
 User submit to login {
 action:submit
 data:NAN }

Solution

Open the first file and make a list of paragraphs

with open('file1.txt', 'r') as f:
    paragraphs = f.read().split('\n\n')

Now open the second file and make a list of paragraphs in the second file and remove the paragraphs that are in the first file

with open('file2.txt', 'r') as f:
    paragraphs2 = f.read().split('\n\n')
    paragraphs2 = [x for x in paragraphs2 if x not in paragraphs]

Now write the changes to the second file

with open('file2.txt', 'w') as f:
    f.write('\n\n'.join(paragraphs2))

Perform the same operations for the third file too

with open('file3.txt', 'r') as f:
    paragraphs3 = f.read().split('\n\n')
    paragraphs3 = [x for x in paragraphs3 if x not in paragraphs]

with open('file3.txt', 'w') as f:
    f.write('\n\n'.join(paragraphs3))

What if there are too many files? We use loops as demonstrated below:

First, create a list of paragraphs

with open('file1.txt', 'r') as f:
    paragraphs = f.read().split('\n\n')

Create a list of all the files that have to be removed duplicates from

import os
lst = [f for f in os.listdir('.') if f.endswith('.txt') and f != 'file1.txt']

Now loop through the list of files and modify them

for f in lst:
    with open(f, 'r') as file:
        paragraphs_in_other_files = file.read().split('\n\n')
        paragraphs_in_other_files = [p for p in paragraphs_in_other_files if p not in paragraphs]

    with open(f, 'w') as file:
        file.write('\n\n'.join(paragraphs_in_other_files))