Search code examples
rdataframesummary

Efficiently create summary table with min, max, mean, SE and sample size by category


I have a dataset that shows the walking speed of different age classes, measured in two different periods. Here is a reproducible example:

Period<-rep(c(1,2),times=c(9,18))
Class<-rep(c("child","teen","adult","toddler","child","teen","adult"),times=c(2,3,4,3,4,5,6))
Speed<-c(2,3,3,4,5,5,4,4,6,1,0.7,0.3,1,2,3,2,2,5,5,4,2,3,5,4,6,5,4)
data<-data.frame(Period,Class,Speed)

   Period   Class Speed
1       1   child   2.0
2       1   child   3.0
3       1    teen   3.0
4       1    teen   4.0
5       1    teen   5.0
6       1   adult   5.0
7       1   adult   4.0
8       1   adult   4.0
9       1   adult   6.0
10      2 toddler   1.0
11      2 toddler   0.7
12      2 toddler   0.3
13      2   child   1.0
14      2   child   2.0
15      2   child   3.0
16      2   child   2.0
17      2    teen   2.0
18      2    teen   5.0
19      2    teen   5.0
20      2    teen   4.0
21      2    teen   2.0
22      2   adult   3.0
23      2   adult   5.0
24      2   adult   4.0
25      2   adult   6.0
26      2   adult   5.0
27      2   adult   4.0

I want to create a summary table with the following attributes:

  1. The minimum, maximum and average speed by age class and by period
  2. The standard error of the speed by age class by period
  3. The number of individuals per class by period

I would also like to add three "total" rows:

  1. minimum, maximum, average speeds, standard errors and number of all individuals in Period 1
  2. The same for period 2
  3. The same over both periods

The resulting summary table should look like this:

 Class         Period  mean   min   max    SE     n
   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 adult              1 4.75    4       6 0.479     4
 2 adult              2 4.5     3       6 0.428     2
 3 child              1 2.5     2       3 0.5       3
 4 child              2 2       1       3 0.408     6
 5 teen               1 4       3       5 0.577     4
 6 teen               2 3.6     2       5 0.678     5
 7 toddler            2 0.667   0.3     1 0.203     3
 8 Total.Period2      2 4       2       6 0.408     9
 9 Total.Period1      1 3.06    0.3     6 0.414    18
10 Total             NA 3.37    0.3     6 0.315    27

I can currently do this using functions from the tidyverse and rbind() in R. I've also been using the function std.error from the package plotrix() to calculate the standard errors. However, this is quite a lengthy process and I'd like to make it more efficient. Any suggestions?


Solution

  • Try this to calculate the summary:

    library(tidyverse)
    rbind(data %>% group_by(Class, Period) %>% summarise(mean = mean(Speed), min = min(Speed), max = max(Speed), SE = sd(Speed), n = n()) %>% ungroup(),
          cbind("Class" = c("Period.1","Period.2"),data %>% group_by(Period) %>% summarise(mean = mean(Speed), min = min(Speed), max = max(Speed), SE = sd(Speed), n = n()) %>% ungroup()),
          cbind("Class" = "Total", "Period" = NA,data %>% summarise(mean = mean(Speed), min = min(Speed), max = max(Speed), SE = sd(Speed), n = n()) %>% ungroup()))
    
    
    

    As far as I know, this is the fastest way to do it