Search code examples
rdataframeoutliers

Outlier cutoff in R


I am trying to cut off the outliers of a variable of a dataframe however it does not perform as expected:

outlier_cutoff1 <- quantile(myd$nov, 0.75) + 1.5 * IQR(myd$nov)
index_outlier1 <- which(myd$nov > outlier_cutoff1)
mydnov <- myd[-index_outlier1, ]

this code does not give error but does not change the outlier values.


Solution

  • There is no need for which here.

    Looking at your code, I think you can remove the "outliers" with the below:

    outlier_cutoff1 <- quantile(myd$nov, 0.75) + 1.5 * IQR(myd$nov)
    index_outlier1 <- (myd$nov > outlier_cutoff1)
    mydnov <- myd[-index_outlier1, ]
    

    Here's a reproducible example that verifiably works (with a vector).

    set.seed(123)
    nov <- rnorm(500)
    
    outlier_cutoff1 <- quantile(nov, 0.75) + 1.5 * IQR(nov)
      #This is 2.574977 
    index_outlier1 <- nov > outlier_cutoff1
      #This returns a logical vector inticating when each value is greater than 2.574977 
    
    mydnov <- nov[-index_outlier1]
    
    length(nov)  #500
    
    length(mydnov)  #499, one was removed