Search code examples
rloopscrosstab

How to loop through several columns to generate multiple crosstabs


I'm trying to create a document that has several crosstabs using multiple y variables and a single x variable. Each y variable should have a separate table. I can do this with each individual crosstab in markdown and use the kableextra package to generate the html tables. However, I have several variables and it would just be easier to do this with one loop. In Stata, I would so something like:

foreach i of varlist var1 var2 var3 {
  tab tab `i' year, row
}

The issue that I'm having with Stata is that it doesn't apply frequency weights in tab. R does apply frequency weights in the crosstab (descr package), and it produces row and column percents.

Here is a sample dataframe:

structure(list(survey_yr = c(2019, 2020, 2019, 2020, 2019, 2020, 
2019, 2020, 2019, 2020, 2019, 2020, 2020, 2019, 2019, 2020, 2019, 
2020, 2019, 2020), Main_Data = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Survey_Month = c(6, 6, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 9, 9, 9, 9, 9, 9, 9), Quarter = c(1, 
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Q1A_3L = c(3, 
1, 3, 3, 3, 1, 3, 3, 3, 3, 1, 2, 3, 3, 3, 3, 3, 1, 3, 2), Q1B_3L = c(3, 
1, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1C_3L = c(3, 
1, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1D_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 2, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1E_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 2), Q1F_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 2), Q1G_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1H_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1I_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 2, 1, 2, 3, 3, 3, 3, 3, 1, 3, 2), Q1J_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 2, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1K_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q1L_3L = c(3, 
3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1M_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 2, 3, 2), Q1N_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 2, 1, 3, 3, 3, 3, 1, 3, 3, 3, 2), Q1O_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1P_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2), Q1Q_3L = c(3, 
1, 3, 3, 3, 1, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 2, 3, 2), Q2_3L = c(3, 
2, 3, 3, 3, 1, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 1, 3, 2), weight = c(0.680000007152557, 
0.680000007152557, 0.823000013828278, 0.823000013828278, 0.823000013828278, 
0.823000013828278, 0.823000013828278, 0.823000013828278, 0.823000013828278, 
0.823000013828278, 1.27100002765656, 0.823000013828278, 0.823000013828278, 
0.823000013828278, 0.823000013828278, 0.823000013828278, 0.823000013828278, 
1.57599997520447, 0.823000013828278, 0.823000013828278)), row.names = c(4L, 
5L, 6L, 7L, 9L, 10L, 11L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
23L, 26L, 27L, 28L, 31L, 33L), class = "data.frame")

The code that I've been using for the single crosstab, which produces a really nice table is:

ct2=crosstab(dat$Q2_3L,dat$survey_yr, weight = dat$weight, 
       total.c = T, plot = F)
ct2_tab<- descr:::CreateNewTab(ct2)
class(ct2_tab)
kable(ct2_tab) %>%
  kable_classic(full_width = F, html_font = "Cambria")

I would like to figure out a way to write a loop that does this over multiple columns. This is close, but I want specific columns, not all columns in the dataframe. Secondly, it renames the actual column name to "col", while I need to keep the original column names. Finally, I can't figure out how to export this to an html, docx, excel, or whatever type of document.

for (col in df) {
  ct_=crosstab(col, dat1$survey_yr, 
      weight = dat1$weight, format = "SPSS", prop.c = T, plot = F)
 print(ct_)

Thanks in advance.


Solution

  • I think you can use

    library(descr)
    
    for (col in names(df)[5:22]) {
      ct_ <- crosstab(df[[col]], 
                      df[["survey_yr"]], 
                      weight = df[["weight"]], 
                      format = "SAS", 
                      prop.c = TRUE, 
                      plot = FALSE)
      ct_[["RowData"]] <- col
      ct_[["ColData"]] <- "survey_yr"
      print(ct_)
    }
    

    The names(df)[5:22] iterates over the columns like "Q1H_3". This returns something like

       Cell Contents 
    |-------------------------|
    |                       N | 
    |           N / Col Total | 
    |-------------------------|
    
    ===============================
              survey_yr
    Q1Q_3L     2019    2020   Total
    -------------------------------
    1             1       2       3
              0.125   0.222        
    -------------------------------
    2             0       2       2
              0.000   0.222        
    -------------------------------
    3             7       5      12
              0.875   0.556        
    -------------------------------
    Total         8       9      17
              0.471   0.529        
    ===============================
       Cell Contents 
    |-------------------------|
    |                       N | 
    |           N / Col Total | 
    |-------------------------|
    
    ==============================
             survey_yr
    Q2_3L     2019    2020   Total
    ------------------------------
    1            1       3       4
             0.125   0.333        
    ------------------------------
    2            0       2       2
             0.000   0.222        
    ------------------------------
    3            7       4      11
             0.875   0.444        
    ------------------------------
    Total        8       9      17
             0.471   0.529        
    ==============================
    

    Print to txt

    You can save this output to a file (for example a .txt-file) using sink():

    for (col in names(df)[5:22]) {
      sink(file = paste0(col, ".txt"))
      ct_ <- crosstab(df[[col]], 
                      df[["survey_yr"]], 
                      weight = df[["weight"]], 
                      format = "SAS", 
                      prop.c = TRUE, 
                      plot = FALSE)
      ct_[["RowData"]] <- col
      ct_[["ColData"]] <- "survey_yr"
      print(ct_)
      sink()
    }
    

    This creates multiple files in your current working directory, for example Q1A_3L.txt and Q1B_3L.txt.