Search code examples
routliers

R excluding outliers in statistical data


I have data for bird species where I am comparing wing length and weight over years and over each other. I noticed that some of the data received from ringing stations included inaccurate inputs. An example is for a certain specie, the wing length was everything between 40-60mm however, there is an outlier at 578mm and this must be a result of input error. Is it possible to exclude these extreme outliers from the data set?


Solution

  • You can remove these values from your dataframe with something like

    df <- df[-which(df$wing_length > 500), ]
    

    An example:

    > df <- data.frame(a=1:10, b=11:20)
    > df
        a  b
    1   1 11
    2   2 12
      ...
    9   9 19
    10 10 20
    > df <- df[ - which(df$a>5), ]
    > df
      a  b
    1 1 11
    2 2 12
    3 3 13
    4 4 14
    5 5 15