I have a fixed width file like below, where 1-9 and 18-21 are the key. Depending on which I am trying to get the output file without duplicate.
In File
12345ABCD78.90200ABCD
12345ABCD90.45300ABCD
11111EFGH56.75100ABCD
12345ABCD34.45400ABCD
11111EFGH75.90200ABCD
Out File
12345ABCD34.45400ABCD
11111EFGH75.90200ABCD
I have Tried using awk as below but not able to get the last occurrence of the duplicate. Can anyone help more on this.
awk -v df=Duplicates_File.dat -v of=Output_wdout_Duplicate.dat '
(substr($0, 1, 18),substr($0, 174, 3)) in key {
print > df
next
}
{ key[substr($0, 1, 18),substr($0, 174, 3)]
print > of
}' Inputfile
Please try following awk
code. Written and tested with shown samples.
awk '{arr[substr($0,1,9),substr($0,18,4)]=$0} END{for(i in arr){print arr[i]}}' Input_file
Explanation: Simple explanation would be, creating arr with index of 1st 9 characters and 18th to 21st characters and having current line value in it; keep doing same till whole Input_file is done with reading. In END
block of this program printing all elements of array, which will basically provide all elements last occurrence only.
2nd solution: Using GNU awk
's FIELDSWIDTH
option you can try following.
awk 'BEGIN{FIELDWIDTHS = "9 8 4 *"} {arr[$1,$3]=$0} END{for(i in arr){print arr[i]}}' Input_file