Search code examples
bashawkgrepformattingtext-processing

How can I search file1.txt and file2.txt for matching characters and print output to a new file


Problem: I need help with a task where I have two text files, file1.txt and file2.txt. The files have similar formats, but the names are on different line numbers, and they have different numbers of lines. The task is to check which names in file1.txt match the names in file2.txt, and then print the matching lines from file2.txt into a new file (file3.txt).

Example file formats: file1.txt:

NAME:FLAT
Jerome:Flat 6
Jimmy:Flat 4

file2.txt:

0:NAME:JOB:MONEY:FLAT
1:Bob:Developer:$500:Flat 7
2:Jerome:Gardener:$50:Flat 6
3:Cindy:Graphics:$100:Flat 5
4:Jimmy:Mod:$150:Flat 4

What I want to achieve: I want to compare the names in file1.txt (e.g., Jerome, Jimmy) and check if they also exist in file2.txt.

I want to output only the matching lines from file2.txt. Any names in file2.txt that don’t appear in file1.txt should be ignored. For example, "Bob" and "Cindy" appear in file2.txt, but not in file1.txt, so they should be ignored. The matching lines (like "Jerome" and "Jimmy") from file2.txt should be copied into a new file (file3.txt).

Example of expected output: If Jerome and Jimmy from file1.txt match the lines in file2.txt, the output file (file3.txt) should look like this:

file3.txt:

2:Jerome:Gardener:$50:Flat 6
4:Jimmy:Mod:$150:Flat 4

What I have tried: Here is the code I have tried so far, which uses awk to do the matching:

awk -F ":" 'FNR==NR{a[$1];next}($1 in a){print}' file2.txt file1.txt > file3.txt

What I need help with: If anyone could help me figure out whether this is possible or offer a better solution, I’d really appreciate it!


Solution

  • With your shown samples, could you please try following. Written and tested with GNU awk.

    awk '
    BEGIN  { FS=":" }
    FNR==1 { next   }
    FNR==NR{
      arr[$1]
      next
    }
    ($2 in arr)
    ' file1.txt file2.txt
    

    Explanation: Adding detailed explanation for above.

    awk '                    ##Starting awk program from here.
    BEGIN  { FS=":" }        ##Starting BEGIN section from here and setting FS as : here.
    FNR==1 { next   }        ##Checking if this is first line in any of Input_file then simply go to next line.
    FNR==NR{                 ##This condition will be TRUE when file1.txt is being read.
      arr[$1]                ##Creating array with $1 as key here.
      next                   ##next will skip all further statements from here.
    }
    ($2 in arr)              ##Checking condition if 2nd fueld is in arr then print line from file2.txt
    ' file1.txt file2.txt    ##Mentioning Input_file names here.