Search code examples
rboxplottukey

Remove outlier from five-number summary statistics


How can I force fivenum function to not put outliers as my maximum/minimum values?

I want to be able to see uppper and lower whisker numbers on my boxplot.

My code:

boxplot(data$`Weight(g)`)
text(y=fivenum(data$`Weight(g)`),labels=fivenum(data$`Weight(g)`),x=1.25, title(main = "Weight(g)"))

enter image description here


Solution

  • boxplot returns a named-list that includes things you can use to remove outliers in your call to fivenum:

    • $out includes the literal outliers. It can be tempting to use setdiff(data$`Weight(g)`), but that may be prone to problems due to R FAQ 7.31 (and floating-point equality), so I recommend against this; instead,

    • $stats includes the numbers used for the boxplot itself without the outliers. I suggest we work with this.

    (BTW, title(.) does its work via side-effect, and it is not used by text(.), I suggest you move that call.)

    Reproducible data/code:

    vec <- c(1, 10:20, 30)
    bp <- boxplot(vec)
    str(bp)
    # List of 6
    #  $ stats: num [1:5, 1] 10 12 15 18 20
    #  $ n    : num 13
    #  $ conf : num [1:2, 1] 12.4 17.6
    #  $ out  : num [1:2] 1 30
    #  $ group: num [1:2] 1 1
    #  $ names: chr "1"
    
    five <- fivenum(vec[ vec >= min(bp$stats) & vec <= max(bp$stats)])
    text(x=1.25, y=five, labels=five)
    title("Weight(g)")
    

    basic boxplot with corrected fivenum labels