Search code examples
mathawkmedian

Calculating median value


I have a text file containing thousands of lines, a single numeric value on each line. Values are between -2.5 - 2.5, single decimal.

I am using this line to give me the lowest value, highest value, median and average value.

awk '{a[i++]=$0;s+=$0}END{print a[0],a[i-1],(a[int(i/2)]+a[int((i-1)/2)])/2,s/i}

It's otherwise perfect, but I'd like to get the median as a decimal number with one decimal. It now returns an integer.

Can you help me?

My knowledge of awk is very limited. Perhaps someone more educated can help me.


Solution

  • I'm not sure what you mean by "a decimal number with one decimal" (I could not get 0.1 to be .1, if that's what you wanted), and you'll need to ensure your version of awk supports asort(), but this (run today at https://www.jdoodle.com/execute-awk-online, which should be "NU AWK 5.1.1, with an API version of 3.1" as far as I can tell) should help:

    Code:

    BEGIN{
        highest=-2.6
        lowest=2.6
    }{
        a[NR]=$0
        if($0>highest){highest=$0}
        if($0<lowest){lowest=$0}
        average=average+$0
    }END{
        n=asort(a)
        if(n%2==1){median=a[(n+1)/2]}else{median=(a[n/2]+a[n/2+1])/2}
        printf("lowest=%f, highest=%f, average=%f, median=%1.1f\n",
        lowest, highest, average/NR, median)
    }
    

    Input (Note-- It cannot have a blank line.):

    -2.5
    -1.4
    0.1
    -0.9
    2.4
    2.3
    2.2
    

    Output:

    lowest=-2.500000, highest=2.400000, average=0.314286, median=0.1