Search code examples
rcompareexpss

Compare two variables (both numeric or both factors) in expss tables


I am digging deeper and deeper into the expss package, and face one of the examples mentioned here --> https://gdemin.github.io/expss/#example_of_data_processing_with_multiple-response_variables (more particularly the last table of the section.

Consider the following dataframes:

vecA <- factor(c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10)),levels=c(1,2,3,4,5))
vecB <- factor(c(rep(1,20),rep(2,20),rep(NA,10)),levels=c(1,2,3,4,5))
df_fact <- data.frame(vecA, vecB)

vecA_num <- as.numeric(c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10)))
vecB_num <- as.numeric(c(rep(1,20),rep(2,20),rep(NA,10)))
df_num <- data.frame(vecA, vecB)

Strictly copying the suggested code (URL above), here is what my table look like:

df_fact %>%
  tab_cols(total(label = "#Total| |")) %>% 
  tab_cells(list(vecA)) %>%
  tab_stat_cpct(label="vecA", total_row_position="above", total_statistic="u_cases") %>%
  tab_cells(list(vecB)) %>% 
  tab_stat_cpct(label="vecB", total_row_position="above", total_statistic="u_cases") %>%
  tab_pivot(stat_position = "inside_columns") %>%  
  recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy)

Slightly different procedure with a numeric example:

df_num %>%
  tab_cols(total(label = "#Total| |")) %>% 
  tab_cells(vecA_num, vecB_num) %>%
  tab_stat_valid_n(label = "Valid N") %>%
  tab_stat_mean(label="Mean") %>%
  tab_pivot(stat_position = "inside_columns") %>%  
  recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy) %>%
  tab_transpose()

Issues start here, since these complex constructs are... complex!

1) I would like to include tab_last_sig* family of functions but I cannot figure out how to do it (and possibly subtotals/nets when variables are factors)

2) Including multiple statistics (cases, percents, means...) altogether is a challenge

3) Last, it is not clear to me where I should write the statistic names / variable names

I have not found detailed documentation for these constructs, hence this message in a bottle :)


Solution

    1. It's a pity, but by now significance testing is supported only for independent samples. In your examples you want compare statistics on the dependent samples. You can ran significance calculations for independent proportions but results will be inaccurate.
    2. Including multiple statistics is not difficult - you need just sequentially write tab_stat_. But complex table layout really is a challenge :(
    3. Variable names for statistic always should be written in the tab_cells. After that you can write statistic functions with tab_stat_mean, tab_stat_cpct and etc. You can find documentation by printing ?tab_pivot in the R console. It is a standard way of getting manual for R functions.