Search code examples
bashawkunique

Unix bash: select rows with unique value in one column, based on value of another column


I have a file with two columns that looks something like this:

1 3
1 4
2 3
3 3
4 3
4 4

I want to make this into a file with unique values in the first columns, and of the duplicate rows only keep the rows with the largest values in the second column, so the new file looks like this:

1 4
2 3
3 3
4 4

Any ideas on how to achieve this using bash/awk/etc?


Solution

  • Using awk you can manage this using an associative array that has key as column-1 and value as maximum of column-2:

    awk '$2 > a[$1]{a[$1] = $2} END{for (i in a) print i, a[i]}' file
    
    1 4
    2 3
    3 3
    4 4