Awk file compare

two files which has component name and version number separated by a space:

cat file1
com.acc.invm:FNS_PROD 94.0.5
com.acc.invm:FNS_TEST_DCCC_Mangment 94.1.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.9
com.acc.invm:SendEmail 29.6.113
com.acc.invm:SendSms 12.23.65
com.acc.invm:newSer 10.10.10

cat file2 
com.acc.invm:FNS_PROD 94.0.5
com.acc.invm:FNS_TEST_DCCC_Mangment 94.0.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.10
com.acc.invm:SendEmail 29.60.113
com.acc.invm:SendSms 133.28.65
com.acc.invm:distri_cob 110.10.10

needed output is:

(1) list of components from file1, which are in file1 and not present in file2.
(2) list of components from file2, which are in file1 and not in present in file2.

In this example the desired output is:

components from file1:

com.acc.invm:newSer 10.10.10

components from file2:

com.acc.invm:distri_cob 110.10.10

NOTE: We have to ignore if components are present with different version.

My code is : (1)

 cat new.awk
 { split($2,a,/\./); curr = a[1]*10000 + a[2]*100 + a[3] }
 NR==FNR { prev[$1] = curr; next }
 !($1 in prev) && (curr > prev[$1])

 /usr/bin/nawk -f new.awk f2 f1

OUTPUT

com.acc.invm:newSer 10.10.10

(2)

/usr/bin/nawk -f new.awk f1 f2

OUTPUT

com.acc.invm:distri_cob 110.10.10

Is this logic is correct? AND

anyone can help me how can I write new.awk in my script itself so new.awk file should not be required to run this.

Solution

You can print the unique components from both files with a single invocation of awk:

# Save all the components from the first file into an array
NR == FNR { a[$1] = $0; next }

# If a component from the second file is found, delete it from the array
$1 in a { delete a[$1]; next }

# If a component in the second file is not found, print it
{ print }

# Print all the components from the first file that weren't in the second
END { for (i in a) print a[i] }


$ cat file1
com.acc.invm:FNS_PROD 94.0.5
com.acc.invm:FNS_TEST_DCCC_Mangment 94.1.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.9
com.acc.invm:SendEmail 29.6.113
com.acc.invm:SendSms 12.23.65
com.acc.invm:newSer 10.10.10


$ cat file2
com.acc.invm:FNS_PROD 94.0.5
com.acc.invm:FNS_TEST_DCCC_Mangment 94.0.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.10
com.acc.invm:SendEmail 29.60.113
com.acc.invm:SendSms 133.28.65
com.acc.invm:distri_cob 110.10.10


$ awk -f cf.awk file2 file1
com.acc.invm:newSer 10.10.10
com.acc.invm:distri_cob 110.10.10

For the second part of your question, if you want to run this without having the code in a separate awk file, you can just have the code inline like so:

 awk 'NR==FNR {a[$1]=$0; next} $1 in a {delete a[$1]; next}1 END {for (i in a) print a[i]}' file2 file1

(Note that the 1 before the END is the same as having { print }, since 1 is always true and print is the default action.)