Search code examples
bashsedawkmedian

median of column with awk


How can I use AWK to compute the median of a column of numerical data?

I can think of a simple algorithm but I can't seem to program it:

What I have so far is:

sort | awk 'END{print NR}' 

And this gives me the number of elements in the column. I'd like to use this to print a certain row (NR/2). If NR/2 is not an integer, then I round up to the nearest integer and that is the median, otherwise I take the average of (NR/2)+1 and (NR/2)-1.


Solution

  • This awk program assumes one column of numerically sorted data:

    #/usr/bin/env awk
    {
        count[NR] = $1;
    }
    END {
        if (NR % 2) {
            print count[(NR + 1) / 2];
        } else {
            print (count[(NR / 2)] + count[(NR / 2) + 1]) / 2.0;
        }
    }
    

    Sample usage:

    sort -n data_file | awk -f median.awk