Search code examples
rcellsummarykablekableextra

How to build a two-way table summarizing a third variable in R (kable package)


I am working with RMarkdown and trying to use kable package. I have a three-variable data frame: gender (factor), age_group (factor), and test_score(scale). I want to create two-way tables with factor-variables (gender and age_groups) as table rows and columns, and summary statistics of test_scores as cell contents. These summary statistics are mean, standard deviation, and percentiles (median, 1st decile, 9th decile, and 99th percentile). Is there an easy way of building those tables in a beautiful way (like with kable package), without needing to input all those values into a matrix first? I searched the kable help file, but could not find how to do it.

# How my data looks like:

gender <- rep(c(rep(c("M", "F"), each=3)), times=3)
age <- as.factor(rep(seq(10,12, 1), each=6))
score <- c(4,6,8,4,8,9,6,6,9,7,10,13,8,9,13,12,14,16)
testdata <-data.frame(gender,age,score)


| gender | age | score |
|--------|-----|-------|
| M      | 10  | 4     |
| M      | 10  | 6     |
| M      | 10  | 8     |
| F      | 10  | 4     |
| F      | 10  | 8     |
| F      | 10  | 9     |
| M      | 11  | 6     |
| M      | 11  | 6     |
| M      | 11  | 9     |
| F      | 11  | 7     |
| F      | 11  | 10    |
| F      | 11  | 13    |
| M      | 12  | 8     |
| M      | 12  | 9     |
| M      | 12  | 13    |
| F      | 12  | 12    |
| F      | 12  | 14    |
| F      | 12  | 16    |

I would like a table that looks like below (but calculated directly from my dataset and with a beautiful publishing format):

      Mean score by gender & age
|        | 10yo | 11yo | 12yo | Total |
|--------|:----:|:----:|:----:|:-----:|
| Male   |   6  |   7  |  10  |  7.7  |
| Female |   7  |  10  |  14  |  10.3 |
| Total  |  6.5 | 88.5 |  12  |   9   |

I tried to use kable package, which indeed provided me some beautiful tables (nicely formatted), but I am only able to produce frequency tables with it. But I cannot find any argument in it to choose for summaries of variables. If anyone has a suggestion of a better package to build a table like above specified, I would appreciate it a lot.

kable(data, "latex", booktabs = T) %>%
   kable_styling(latex_options = "striped")

Solution

  • Absent a reproducible example, multi-way tables including a variety of statistics can be created with the tables::tabular() function.

    Here is an example from the tables documentation, page 38 that illustrates multiple variables in a table that prints means and standard deviations.

    set.seed(1206)
    
    q <- data.frame(p = rep(c("A","B"),each = 10,len = 30), 
                    a = rep(c(1,2,3),each = 10),
                    id = seq(30),
                    b = round(runif(30,10,20)),
                    c = round(runif(30,40,70)))
    library(tables)
    tab <- tabular((Factor(p)*Factor(a)+1) ~ (N = 1) + (b + c) * (mean + sd),
                   data = q)
    tab[ tab[,1] > 0, ]
    

    A Stackoverflow friendly version of the output is:

              b           c          
     p a   N  mean  sd    mean  sd   
     A 1   10 14.40 3.026 55.70 6.447
       3   10 14.50 2.877 52.80 8.954
     B 2   10 14.40 2.836 56.30 7.889
       All 30 14.43 2.812 54.93 7.714
    >
    

    One can render the table to HTML with the html() function. The output from the following code, when rendered in an HTML browser looks like the following illustration.

    html(tab[ tab[,1] > 0, ])
    

    enter image description here

    tables includes capabilities to calculate other statistics, including quantiles. For details on quantile calculations, see pp. 29 - 30 of the tables package manual.

    The package also works with knitr, kable, and kableExtra.