Search code examples
rsubsetboxplotsummaryquantile

Get boxplot stats of a column separated by values in another column in dataframe in R


I have a data frame like this:

distance exclude   
1.1    F
1.5    F
3      F
2      F
1      F
5      T
3      F
63     F
32     F
21     F
15     F
1      T

I want get the four boxplot stats of each segment of data in distance column separated by "T" in exclude column, here "T" serves as separator.

Can anyone help? Thanks so much!


Solution

  • First, let's create some fake data:

    library(dplyr)
    
    # Fake data
    set.seed(49349)
    dat = data.frame(distance=rnorm(500, 50, 10), 
                     exclude=sample(c("T","F"), 500, replace=TRUE, prob=c(0.03,0.95)))
    

    Now create a new group each time exclude = "T". Then, for each group, and calculate whatever statistics you wish and return the results in a data frame:

    box.stats = dat %>% 
      mutate(group = cumsum(exclude=="T")) %>%
      group_by(group) %>%
      do(data.frame(n=length(.$distance), 
                    out_90 = sum(.$distance > quantile(.$distance, 0.9)),
                    out_10 = sum(.$distance < quantile(.$distance, 0.1)),
                    MEAN = round(mean(.$distance),2), 
                    SD = round(sd(.$distance),2),
                    out_2SD_high = sum(.$distance > mean(.$distance) + 2*sd(.$distance)),
                    round(t(quantile(.$distance, probs=c(0,0.1,0.25,0.5,0.75,0.9,1))),2)))
    
    names(box.stats) = gsub("X(.*)\\.$", "p\\1", names(box.stats))
    
    box.stats
    
       group     n out_90 out_10  MEAN    SD out_2SD_high    p0   p10   p25   p50   p75   p90  p100
    1      0    15      2      2 46.21  8.78            0 28.66 36.03 41.88 46.04 52.33 56.30 61.98
    2      1    36      4      4 50.03 10.01            0 21.71 38.78 44.63 51.13 56.66 61.58 67.84
    3      2    80      8      8 50.36  9.00            1 20.30 38.10 45.95 51.28 56.51 61.74 70.44
    4      3     9      1      1 55.62  8.58            0 42.11 47.10 49.19 54.54 63.63 65.84 67.88
    5      4    16      2      2 47.70  7.79            0 29.03 39.89 43.60 49.26 52.92 56.97 58.02
    6      5    66      7      7 49.86  9.93            2 24.84 36.00 45.05 50.51 55.65 61.41 75.27
    7      6    44      5      5 50.35 10.39            1 31.72 36.36 43.49 50.95 55.78 64.88 73.64
    8      7    80      8      8 49.18  9.24            1 27.62 37.86 42.06 50.34 56.60 59.66 72.13
    9      8    31      3      3 52.56 11.18            0 25.78 39.94 44.10 51.32 62.02 66.35 70.40
    10     9    60      6      6 50.31  9.82            1 25.43 37.44 44.53 50.31 56.78 62.36 71.77
    11    10    33      4      4 49.99  9.78            2 32.74 38.72 42.56 49.60 55.75 62.86 72.20
    12    11    30      3      3 48.26 11.47            1 30.03 37.68 40.24 45.65 55.42 60.18 79.36