This is a follow up question from: Unexpected result comparing values of rows and columns in two text files
I have created a structure to compare two text files according to their rows and columns. Here are the file structures:
file1.txt
Name Col1 Col2 Col3
-----------------------
row1 1 4 7
row2 2 5 8
row3 3 6 9
file2.txt
Name Col1 Col2 Col3
-----------------------
row1 1 4 7
row2 2 5 999
Here is the code I have so far:
dos2unix ravi # 2>/dev/null
dos2unix ravi2 # 2>/dev/null
awk '
FNR < 2 {next} # skips first two lines
FNR == NR {
for (i = 2; i <= NF; i++) {
a[i,$1] = $i;
}
b[$1];
next;
}
($1 in b) { # check if row in file2 existed in file1
for (i = 2; i <= NF; i++) {
if (a[i,$1] == $i)
printf("%s->col%d: %s vs %s: Are Equal\n", $1, i-1, a[i,$1], $i);
else
printf("%s->col%d: %s vs %s: Not Equal\n", $1, i-1, a[i,$1], $i);
}
}
!($1 in b) { # check if row in file2 doesn't exist in file1.
for (i = 2; i <= NF; i++)
printf("%s->col%d: %s vs %s: Are Not Equal\n", $1, i-1, "blank", $i);
}
// pattern needed to check if row in file1 doesn't exist in file2.
' $PWD/file1.txt $PWD/file2.txt
Does anyone have any tips, suggestions or hints to have a pattern in the awk
statement to check if row in file1 doesn't exist in file2. See the sample outputs below to understand what I mean. (ie: Basically, I want to print the values of row3 in file1 doesn't exist in file2). Thanks! Let me know if further explanation is needed.
Desired Output:
row2->Col1: 1 vs 1: Equal
row2->Col2: 4 vs 4: Equal
row2->Col3: 7 vs 7: Equal
row1->Col1: 2 vs 2: Equal
row1->Col2: 5 vs 5: Equal
row1->Col3: 8 vs 999: Not Equal
row3->Col1: 3 vs (blank) : Not Equal
row3->Col2: 6 vs (blank) : Not Equal
row3->Col3: 9 vs (blank) : Not Equal
Actual Output:
row2->Col1: 1 vs 1: Equal
row2->Col2: 4 vs 4: Equal
row2->Col3: 7 vs 7: Equal
row1->Col1: 2 vs 2: Equal
row1->Col2: 5 vs 5: Equal
row1->Col3: 8 vs 999: Not Equal
Extending your answer:
$ cat script.awk
FNR < 2 { next } # skips first two lines
FNR == NR {
for (i = 2; i <= NF; i++) { a[i,$1] = $i }
b[$1];
next;
}
($1 in b) { # check if row in file2 existed in file1
for (i = 2; i <= NF; i++) {
if (a[i,$1] == $i)
printf("%s->col%d: %s vs %s: Are Equal\n", $1, i-1, a[i,$1], $i);
else
printf("%s->col%d: %s vs %s: Not Equal\n", $1, i-1, a[i,$1], $i);
}
delete b[$1]; # delete entries which are processed
}
END {
for (left in b) { # look which didn't match
for (i = 2; i <= NF; i++)
printf("%s->col%d: %s vs (blank): Not Equal\n", left, i-1, a[i,left])
}
}
Run it like:
$ awk -f script.awk file1 file2
row1->col1: 1 vs 1: Are Equal
row1->col2: 4 vs 4: Are Equal
row1->col3: 7 vs 7: Are Equal
row2->col1: 2 vs 2: Are Equal
row2->col2: 5 vs 5: Are Equal
row2->col3: 8 vs 999: Not Equal
row3->col1: 3 vs (blank): Not Equal
row3->col2: 6 vs (blank): Not Equal
row3->col3: 9 vs (blank): Not Equal