Search code examples
rdplyrgroup-summaries

summarize data with two functions in dplyr


Considering this example dataframe:

d <- read.table(text="
  trt rep y  
  1   1   30   
  1   1   50   
  1   1   70   
  1   2   0   
  1   2   0   
  1   2   0   
  2   1   10   
  2   1   0   
  2   1   0   
  2   2   5   
  2   2   0   
  2   2   .   
  "
  , header = TRUE, check.names = F, na.strings = ".")

I'm trying to obtain a summary table with two operations of the "y" variable.

The first new column should have the simple mean values by trt for every rep:

by_rep1 = d %>% 
  group_by(trt, rep) %>%
  summarise(sev = mean(na.omit(y)))

and the second one, the proportion of positives values by trt for every rep.

by_rep2 = d %>% 
  group_by(trt, rep) %>%
  summarise_each(funs(round(mean(.>0, na.rm=TRUE),2)), y) 

I'm doing this long, because I'm don't have idea how to do it in one step:

inner_join(by_rep1, by_rep2, by = c("trt", "rep"))  

#    trt   rep    mean_y     y
#  (int) (int)     (dbl) (dbl)
#1     1     1 50.000000  1.00
#2     1     2  0.000000  0.00
#3     2     1  3.333333  0.33
#4     2     2  2.500000  0.50

does someone knows how to do that in a single step, joining both functions?


Solution

  • You can put them into a single summarize statement:

    d %>% group_by(trt, rep) %>% summarise(mean_y = mean(y, na.rm = T), 
                                           y = round(mean(y > 0, na.rm = T), 2))
    Source: local data frame [4 x 4]
    Groups: trt [?]
    
        trt   rep    mean_y     y
      (int) (int)     (dbl) (dbl)
    1     1     1 50.000000  1.00
    2     1     2  0.000000  0.00
    3     2     1  3.333333  0.33
    4     2     2  2.500000  0.50