Search code examples
awk

Using awk command to compare values ​in separate rows and multiple columns?


I have a file with four columns:

text1  a1  a2   5
text2  b2  b8   10
text3  b9  b4   15
text3  b9  b4   25
text3  b9  b4   20
text4  h1  g8   50
text4  g1  k5   70
text4  g1  k5   80
text4  g1  k5   50
text5  y5  p3   25

I wanted the following result:

text1  a1  a2   5
text2  b2  b8   10
text3  b9  b4   25
text4  h1  g8   50
text4  g1  k5   80
text5  y5  p3   25

Remove duplicate value from rows that match: The first, second and third columns are the same and in the fourth column take the highest value.

I tried it as follows:

awk '!x[$1]++' file.txt

Solution

  • Using any sort with any awk:

    $ sort -k4rn file | awk '!seen[$1,$2,$3]++'
    text4  g1  k5   80
    text4  h1  g8   50
    text3  b9  b4   25
    text5  y5  p3   25
    text2  b2  b8   10
    text1  a1  a2   5
    

    or without keeping all of the key values in memory in awk:

    $ sort -k1,3 -k4rn file | awk '{prev=key; key=$1 FS $2 FS $3} key!=prev'
    text1  a1  a2   5
    text2  b2  b8   10
    text3  b9  b4   25
    text4  g1  k5   80
    text4  h1  g8   50
    text5  y5  p3   25