Search code examples
rmeanoutliers

What is the meaning of 50 in command ">mean(c(1:10, 50))"


I have tried it by using different numbers in place of 50 and got different answers. Please can anyone tell me the calculation of this number.


Solution

  • There is no "calculation" behind this number. Quite simply, c() creates a vector:

    > c(1:10, 50)
     [1]  1  2  3  4  5  6  7  8  9 10 50
    

    and mean() returns the mean of this vector (the sum divided by the length). If you vary the number, the mean also varies.

    In statistics, a number like 50 here is known as an outlier. One way to obtain an average (or, formally, a measure of central tendency) that's robust to outliers is by computing the median:

    > median(c(1:10, -100))
    [1] 5
    > median(c(1:10, 50))
    [1] 6
    > median(c(1:10, 5000))
    [1] 6
    

    Compare this with the means of the same vectors:

    > mean(c(1:10, -100))
    [1] -4.090909
    > mean(c(1:10, 50))
    [1] 9.545455
    > mean(c(1:10, 5000))
    [1] 459.5455
    

    This example shows how a single outlying observation can greatly affect the mean but not the median.