I have 2 pipe delimited files. yesterday.txt and today.txt
yesterday.txt:
1234|12|Bill|Blatt|programmer
3243|34|Bill|Blatt|dentist
98734|25|Jack|Blatt|programmer
748567|31|Mark|Spark|magician
today.txt
123|12|Bill|Blatt|programmer
3243|4|Bill|Blatt|dentist
934|25|Jack|Blatt|prograbber
30495|89|Dave|Scratt|slobber
I would like to compare the 2 files while ignoring the first 2 fields and output any lines unique to the second file (today.txt), but I want the full lines even though the comparison is omitting the first 2 fields. So in the case above the output would be:
new_today.txt
934|25|Jack|Blatt|prograbber
30495|89|Dave|Scratt|slobber
I tried to accomplish using this:
sort <(cut -d"|" -f3- yesterday.txt) <(cut -d"|" -f3- yesterday.txt) <(cut -d"|" -f3- today.txt) | uniq -u
This almost works, but it doesn't give me the 2 fields that I cut. I'm not sure how to accomplish this. Any help would be much appreciated.
When the size of the first file is not too big, an efficient solution is possible using Awk, and without sorting:
awk -F'|' -v OFS='|' '
NR == FNR {
$1 = "";
$2 = "";
seen[$0]++;
}
NR != FNR {
orig=$0;
$1 = "";
$2 = "";
if (!seen[$0]) print orig
}' today.txt new_today.txt`
As a one-liner: awk -F'|' 'NR == FNR { $1 = ""; $2 = ""; seen[$0]++ } NR != FNR { orig=$0; $1 = ""; $2 = ""; if (!seen[$0]) print orig }' today.txt new_today.txt
For the example input files this outputs:
934|25|Jack|Blatt|prograbber 30495|89|Dave|Scratt|slobber
Here's how it works:
-F'|'
-- use pipe as the field separator.NR == FNR
-- this matches lines in the first input file.
$1
, $2
), and using the rest ($0
) as the key, and count it.NR != FNR
-- this matches lines not in the first input file.
Notice that this approach also preserves the original order of the lines in the second file.