Using the expss package, is it possible to run z-tests across 5 different binary variables? I figured out how to run significance testing on a single variable across different columns using the tab_cols
argument, but I don't have any columns in this case. I'd like to treat the 5 variables I'm testing as 5 different columns (A, B, C, D, E) and run z-tests across all possible combinations.
If the proportion in column A is significantly greater than the proportion in column B, then I would like column A to display the letter "B" after the percentage, like what is shown here:
Z-tests across multiple variables
Here's my attempt:
data %>%
tab_cells(reaction_1_5, reaction_2_5, reaction_3_5, reaction_4_5, reaction_5_5) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot()
Which outputs the following table:
| | | #Total |
| ------------ | ------------ | ------ |
| reaction_1_5 | 0 | 84.3 |
| | 1 | 15.7 |
| | #Total cases | 381 |
| reaction_2_5 | 0 | 80.8 |
| | 1 | 19.2 |
| | #Total cases | 381 |
| reaction_3_5 | 0 | 75.6 |
| | 1 | 24.4 |
| | #Total cases | 381 |
| reaction_4_5 | 0 | 82.4 |
| | 1 | 17.6 |
| | #Total cases | 381 |
| reaction_5_5 | 0 | 78.2 |
| | 1 | 21.8 |
| | #Total cases | 381 |
I believe the tab_last_sig_cpct
function is not working because it computes the z-tests across columns, whereas I only have a single column. I'd like to test all possible combinations of the proportion of 1's (15.7 vs. 19.2 vs. 24.4 vs. 17.6 vs. 21.8) across my 5 variables.
Can this be implemented within the expss package?
Here's the data I'm using:
structure(list(reaction_1_5 = c(0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L,0L, 0L), reaction_2_5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,1L), reaction_3_5 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),reaction_4_5 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L),reaction_5_5 = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L)), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"), .internal.selfref = <pointer: 0x7fc3b38106e0>)
It is possible to combine variables side by side, as in your example:
library(expss)
data = structure(
list(
reaction_1_5 = c(0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L,0L, 0L),
reaction_2_5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,1L),
reaction_3_5 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
reaction_4_5 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L),
reaction_5_5 = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L)
),
row.names = c(NA,-10L),
class = "data.frame"
)
data %>%
stack() %>%
tab_cells("|" = values) %>% # "|" to suppress variable names
tab_cols("|" = ind) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
significance_cpct()
# | | reaction_1_5 | reaction_2_5 | reaction_3_5 | reaction_4_5 | reaction_5_5 |
# | | A | B | C | D | E |
# | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
# | 0 | 70.0 | 80.0 | 90.0 | 80.0 | 80.0 |
# | 1 | 30.0 | 20.0 | 10.0 | 20.0 | 20.0 |
# | #Total cases | 10 | 10 | 10 | 10 | 10 |
But significance_cpct
provides statistic test for independent samples. However, your percentage are computed on the same sample. So we need a test on dependent samples. And by now there is no such test for proportions in the expss
.