Search code examples
routlierspercentilequantile

Does trimming 2% of scores from top and bottom each leave me with quantiles .02 - .98?


If you have a dataset and trim 2% from both the top and bottom, for a 4% total trim, you're left with the middle 96% of scores. Would this mean the only remaining scores would be ranging from the .02 quantile to .98 quantile of the original dataset?

If this is incorrect, how would I trim so as to have only data from the .02 quantile to the .98 quantile?

I am using R and want to trim outliers this way.


Solution

  • Indeed, the 0.02 probability quantile, or second percentile, is the value below which 2% of your data is found.

    To obtain the data between the 2nd and the 98th percentiles, you can use the quantile function:

    # Random samples from a normal distribution
    x <- rnorm(1000)
    # Quantiles
    q <- quantile(x,  probs = c(2, 98)/100)
    # Samples between quantiles
    x2 <- x[x>q[1] & x<q[2]]
    

    Edit: regarding cleaning of outliers you might want to check the comments to this answer to a similar question. The gist is: simply removing a fixed percentage of your data to get rid of outliers is probably wrong.