When using the expss package in R for creating tables, how does one get the row_percentages to be calculated within a nested variable? In the example below, I would like the row percentage to be calculated within each time period. Thus, I would like the row percentage to sum to 100% within each time period (2015-2016 and 2017-2018). Now however, the percentage is calculated over the entire row.
library(expss)
data(mtcars)
mtcars$period <- "2015-2016"
mtcars <- rbind(mtcars, mtcars)
mtcars$period[33:64] <- "2017-2018"
mtcars = apply_labels(mtcars,
cyl = "Number of cylinders",
am = "Transmission",
am = c("Automatic" = 0,
"Manual"=1),
period = "Measurement period"
)
mtcars %>%
tab_cells(cyl) %>%
tab_cols(period %nest% am) %>%
tab_stat_rpct(label = "row_perc") %>%
tab_pivot()
Created on 2019-09-28 by the reprex package (v0.3.0)
| | | | Measurement period | | | |
| | | | 2015-2016 | | 2017-2018 | |
| | | | Transmission | | Transmission | |
| | | | Automatic | Manual | Automatic | Manual |
| ------------------- | ------------ | -------- | ------------------ | ------ | ------------ | ------ |
| Number of cylinders | 4 | row_perc | 13.6 | 36.4 | 13.6 | 36.4 |
| | 6 | row_perc | 28.6 | 21.4 | 28.6 | 21.4 |
| | 8 | row_perc | 42.9 | 7.1 | 42.9 | 7.1 |
| | #Total cases | row_perc | 19.0 | 13.0 | 19.0 | 13.0 |
I believe this is what you are after:
library(expss)
data(mtcars)
mtcars$period <- "2015-2016"
mtcars <- rbind(mtcars, mtcars)
mtcars$period[33:64] <- "2017-2018"
mtcars = apply_labels(mtcars,
cyl = "Number of cylinders",
am = "Transmission",
am = c("Automatic" = 0,
"Manual"=1),
period = "Measurement period"
)
mtcars %>%
tab_cells(cyl) %>%
tab_cols(period %nest% am ) %>%
tab_subgroup(period =="2015-2016") %>%
tab_stat_rpct(label = "row_perc") %>%
tab_subgroup(period =="2017-2018") %>%
tab_stat_rpct(label = "row_perc") %>%
tab_pivot(stat_position = "inside_rows")
Pay attention to the use of tab_subgroup()
which determines which subgroup of year period we want to calculate the percentage as well as to stat_position = "inside_rows"
which determines where we want to put the calculated output in the final table.
Output:
| | | | Measurement period | | | |
| | | | 2015-2016 | | 2017-2018 | |
| | | | Transmission | | Transmission | |
| | | | Automatic | Manual | Automatic | Manual |
| ------------------- | ------------ | -------- | ------------------ | ------ | ------------ | ------ |
| Number of cylinders | 4 | row_perc | 27.3 | 72.7 | | |
| | | | | | 27.3 | 72.7 |
| | 6 | row_perc | 57.1 | 42.9 | | |
| | | | | | 57.1 | 42.9 |
| | 8 | row_perc | 85.7 | 14.3 | | |
| | | | | | 85.7 | 14.3 |
| | #Total cases | row_perc | 19.0 | 13.0 | | |
| | | | | | 19.0 | 13.0 |
EDIT:
We do not need %nest%
if we do not want nested rows(i.e. twice more rows). In this case, the final part of the code should be modified as follows:
mtcars %>%
tab_cells(cyl) %>%
tab_cols(period,am) %>%
tab_subgroup(period ==c("2015-2016")) %>%
tab_stat_rpct(label = "row_perc") %>%
tab_subgroup(period ==c("2017-2018")) %>%
tab_stat_rpct(label = "row_perc") %>%
tab_pivot(stat_position = "outside_columns")
Output:
| | | Measurement period | Transmission | | |
| | | 2015-2016 | Automatic | Manual | Automatic |
| | | row_perc | row_perc | row_perc | row_perc |
| ------------------- | ------------ | ------------------ | ------------ | -------- | --------- |
| Number of cylinders | 4 | 100 | 27.3 | 72.7 | 27.3 |
| | 6 | 100 | 57.1 | 42.9 | 57.1 |
| | 8 | 100 | 85.7 | 14.3 | 85.7 |
| | #Total cases | 32 | 19.0 | 13.0 | 19.0 |
| Measurement period |
Manual | 2017-2018 |
row_perc | row_perc |
-------- | ------------------ |
72.7 | 100 |
42.9 | 100 |
14.3 | 100 |
13.0 | 32 |