I have a file with four columns:
text1 a1 a2 5
text2 b2 b8 10
text3 b9 b4 15
text3 b9 b4 25
text3 b9 b4 20
text4 h1 g8 50
text4 g1 k5 70
text4 g1 k5 80
text4 g1 k5 50
text5 y5 p3 25
I wanted the following result:
text1 a1 a2 5
text2 b2 b8 10
text3 b9 b4 25
text4 h1 g8 50
text4 g1 k5 80
text5 y5 p3 25
Remove duplicate value from rows that match: The first, second and third columns are the same and in the fourth column take the highest value.
I tried it as follows:
awk '!x[$1]++' file.txt
Using any sort
with any awk
:
$ sort -k4rn file | awk '!seen[$1,$2,$3]++'
text4 g1 k5 80
text4 h1 g8 50
text3 b9 b4 25
text5 y5 p3 25
text2 b2 b8 10
text1 a1 a2 5
or without keeping all of the key values in memory in awk:
$ sort -k1,3 -k4rn file | awk '{prev=key; key=$1 FS $2 FS $3} key!=prev'
text1 a1 a2 5
text2 b2 b8 10
text3 b9 b4 25
text4 g1 k5 80
text4 h1 g8 50
text5 y5 p3 25