Search code examples
rexpss

Significance testing rows in expss package


Using the expss package, is it possible to run z-tests across 5 different binary variables? I figured out how to run significance testing on a single variable across different columns using the tab_cols argument, but I don't have any columns in this case. I'd like to treat the 5 variables I'm testing as 5 different columns (A, B, C, D, E) and run z-tests across all possible combinations.

If the proportion in column A is significantly greater than the proportion in column B, then I would like column A to display the letter "B" after the percentage, like what is shown here:

Z-tests across multiple variables

Here's my attempt:

data %>%
    tab_cells(reaction_1_5, reaction_2_5, reaction_3_5, reaction_4_5, reaction_5_5) %>%
    tab_stat_cpct()  %>%
    tab_last_sig_cpct() %>% 
    tab_pivot()

Which outputs the following table:

 |              |              | #Total |
 | ------------ | ------------ | ------ |
 | reaction_1_5 |            0 |   84.3 |
 |              |            1 |   15.7 |
 |              | #Total cases |    381 |
 | reaction_2_5 |            0 |   80.8 |
 |              |            1 |   19.2 |
 |              | #Total cases |    381 |
 | reaction_3_5 |            0 |   75.6 |
 |              |            1 |   24.4 |
 |              | #Total cases |    381 |
 | reaction_4_5 |            0 |   82.4 |
 |              |            1 |   17.6 |
 |              | #Total cases |    381 |
 | reaction_5_5 |            0 |   78.2 |
 |              |            1 |   21.8 |
 |              | #Total cases |    381 |

I believe the tab_last_sig_cpct function is not working because it computes the z-tests across columns, whereas I only have a single column. I'd like to test all possible combinations of the proportion of 1's (15.7 vs. 19.2 vs. 24.4 vs. 17.6 vs. 21.8) across my 5 variables.

Can this be implemented within the expss package?

Here's the data I'm using:

structure(list(reaction_1_5 = c(0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L,0L, 0L), reaction_2_5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,1L), reaction_3_5 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),reaction_4_5 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L),reaction_5_5 = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L)), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"), .internal.selfref = <pointer: 0x7fc3b38106e0>) 

Solution

  • It is possible to combine variables side by side, as in your example:

    
    library(expss)
    data = structure(
        list(
            reaction_1_5 = c(0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L,0L, 0L), 
            reaction_2_5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,1L), 
            reaction_3_5 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
            reaction_4_5 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L),
            reaction_5_5 = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L)
        ), 
        row.names = c(NA,-10L), 
        class = "data.frame"
    )
    
    data %>%
        stack() %>% 
        tab_cells("|" = values) %>% # "|" to suppress variable names
        tab_cols("|" = ind)  %>%
        tab_stat_cpct() %>% 
        tab_pivot() %>% 
        significance_cpct()
    
    # |              | reaction_1_5 | reaction_2_5 | reaction_3_5 | reaction_4_5 | reaction_5_5 |
    # |              |            A |            B |            C |            D |            E |
    # | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
    # |            0 |         70.0 |         80.0 |         90.0 |         80.0 |         80.0 |
    # |            1 |         30.0 |         20.0 |         10.0 |         20.0 |         20.0 |
    # | #Total cases |           10 |           10 |           10 |           10 |           10 |
    

    But significance_cpct provides statistic test for independent samples. However, your percentage are computed on the same sample. So we need a test on dependent samples. And by now there is no such test for proportions in the expss.