Search code examples
unixsortingawktruniq

Sort a column by number of identical occurrences - using awk, sort, tr or uniq?


Let's say I have some tab-separated data:

Peter   5
Joe     8
Peter   7
Peter   8
Joe     4
Laura   3

And I want to sort it by the number of times a name occurs in the first column (max to min) So we'd have Peter (3 occurrences) Joe (2 occurrences) and Laura (1 occurrence).

Peter   5
Peter   7
Peter   8
Joe     8
Joe     4
Laura   3

It only needs sorted by the first column, not the second. I've been reading sort's documentation, and I don't think it has the functionality. Anyone have an easy method?


Solution

  • not sexy but works for your example:

     awk  'NR==FNR{a[$1]++;next}{ print a[$1],$0}' file file|sort -nr|sed -r 's/[0-9]* //'
    

    test with your data:

    kent$  cat n.txt
    Peter   5
    Joe     8
    Peter   7
    Peter   8
    Joe     4
    Laura   3
    
    kent$  awk  'NR==FNR{a[$1]++;next}{ print a[$1],$0}' n.txt n.txt|sort -nr|sed -r 's/[0-9]* //'
    Peter   8
    Peter   7
    Peter   5
    Joe     8
    Joe     4
    Laura   3