Search code examples
linuxstringbashsortingmedian

Get median from string of values


I need to find median of values saved in a string. I have to implement this in bash without any additional temporary files and I cannot use awk.

I have this string saved in $string:

85 13 4 45 1111 89 87 66 1 5 2 51 13 66 98 50 20 14 18 16 31 21 5175 12

First, I need to sort those values like this:

1 2 4 5 12 13 13 14 16 18 20 21 31 45 50 51 66 66 85 87 89 98 1111 5175

And then I need to find median of these values

(21+31) / 2 = 26

How can I achieve this? Is there any efficient way or command available in bash?

My idea:

To sort values, I could use sort, but I'm not sure how to force it to sort values from string, because it's using FILE.

I have no idea how to achieve median though, so I would appreciate small hint at least.


Solution

  • To get the numbers from the string into a sorted array, you can print them on a separate line each, pipe to sort -n and then read into an array with mapfile:

    string='85 13 4 45 1111 89 87 66 1 5 2 51 13 66 98 50 20 14 18 16 31 21 5175 12'
    mapfile -t arr < <(for num in $string; do echo "$num"; done | sort -n)
    

    The -t option removes newlines from each value. Notice that you cannot pipe to mapfile because that would be in a subshell and arr would be empty afterwards.

    It is usually a good idea to quote your variables, but in this case we rely on word spliting and must not quote $string.

    Now, for the median, there are two options:

    • There is an odd number of array elements and we just want the value of the middle element.
    • There is an even number of array elements, and we want the mean of the two middle elements.

    The number of array elements is ${#arr[@]}, so we can check that and then decide what to do:

    nel=${#arr[@]}
    if (( nel % 2 == 1 )); then     # Odd number of elements
        val="${arr[ $((nel/2)) ]}"
    else                            # Even number of elements
        val="$(( ( arr[$((nel/2))] + arr[$((nel/2-1))] ) / 2 ))"
    fi
    printf "%d\n" "$val"
    

    This relies on integer arithmetics: if we have an odd number of elements, say three, the index of the median is 1 – which we get from integer division of three by two. For an even number of elements, say four, we want the elements at index 1 and 2, which we get by divding four by two for the higher index and subtracting one from it for the lower index.

    If the two elements don't add up to an even number, the result will be rounded down. If that's not good enough, we can either check if the number is odd and manually add .5 to the result, or we can use bc to do the calculation. Consider:

    $ echo $(( 11/2 ))
    5
    $ bc <<< 'scale=1; 11/2'
    5.5