Search code examples
bashsortingawkcut

Sorting a column in bash by amount of occurances of words


So I have a text being outputted that has an ip adress in one column and an http status code in the other. I wan't to sort this column by number of occurances so that

x.x 1
x.x 2
x.y 1
x.z 3
y.x 4
x.x 5
x.x 4
x.x 4

Looks like

y.x 4
x.x 4
x.x 4
x.x 1
x.y 1
x.x 5
x.z 3
x.x 2

This is for the second column of status codes, the ip adresses dont need to be sorted in any particular order

Since 4 is the most common one it should be first and then 1 and so forth.

However all that I can find is how to use uniq for example in order to count the occurances, thereby removing duplicates and prefixing a number to each row.

The regular sort command does not support this as far as i can tell as well.

Any help would be appreciated


Solution

  • You can use this awk + sort + cut combination:

    awk 'NR==FNR{++freq[$2]; next} {print freq[$2] "\t" $0}' file{,} | sort -k1nr | cut -f 2-
    x.x 4
    x.x 4
    y.x 4
    x.x 1
    x.y 1
    x.x 2
    x.x 5
    x.z 3
    

    Details:

    1. awk command calculated frequency of 2nd field and adds it in front of record
    2. sort command does reverse numeric sort on frequency field
    3. cut command strips first column from final output