Search code examples
rapplysummary

How to summarise multiple vriables/items across age group at once


I have a dataframe containing over 150,000 entries. Example below:

ID <-    1111, 1222, 3333, 4444, 1555, 6666
V1 <-    1,     0,    1,    0,    0,     0
V2 <-    1,     0,    0,    0,    0,     1
V3 <-    0,     1,    1,    0,    0,     1
V4 <-    1,     0,    1,    1,    0,     0
AgeGr <- 15-24,24-35,15-24,35-48, 48+, 35-48

All the variables (V1-V4 in the example) are dichotomous questions answered in 0/1. Now I would like to summarise all the occurrence of 0/1 of every variable across age groups. I expect output like this:

Variable       V1      V2      V3      V4    # Variale names
Answer        0  1    0  1    0  1    0  1   # answer levels (1/0)
15-24         0  2    1  1    1  1    0  2   # the frequency of "0" and "1" under this age group
24-35         1  0    1  0    0  1    1  0   
35-48         2  0    1  1    1  1    0  1
48+           1  0    1  0    1  0    1  0

I have tried janitor::tabyl, using tabyl(df,AgeGr, V1). Yet it only summarises the V1 in one line. When I tried tabyl(df,AgeGr, df[,V1:V4]), it failed. I am wondering if I could go with tabyl() and use functions like apply()? or should I turn to something else?

Any suggestions will be highly appreciated. Thank you in advance:)


Solution

  • You could do the following:

    ID <-    c(1111, 1222, 3333, 4444, 1555, 6666)
    V1 <-    c(1,     0,    1,    0,    0,     0)
    V2 <-    c(1,     0,    0,    0,    0,     1)
    V3 <-    c(0,     1,    1,    0,    0,     1)
    V4 <-    c(1,     0,    1,    1,    0,     0)
    AgeGr <- c("15-24","24-35","15-24","35-48", "48+", "35-48")
    
    df <- data.frame(ID=ID,V1=V1,V2=V2,V3=V3,V4=V4,AgeGr = AgeGr, stringsAsFactors = FALSE)
    
    ageAnswerSplit <- split(df[,c("V1","V2","V3","V4")],df[["AgeGr"]])
    
    summarized <- do.call("rbind",lapply(ageAnswerSplit, function(answerdf) {
      answertables <- lapply(names(answerdf), function(nam) {
        at <- table(answerdf[[nam]])
        setNames(data.frame(unname(at["0"]),unname(at["1"])),paste0(nam,":",c(0,1)))
      })
      do.call("cbind",answertables)
    }))
    summarized[is.na(summarized)] <- 0
    

    resulting in

    > summarized
          V1:0 V1:1 V2:0 V2:1 V3:0 V3:1 V4:0 V4:1
    15-24    0    2    1    1    1    1    0    2
    24-35    1    0    1    0    0    1    1    0
    35-48    2    0    1    1    1    1    1    1
    48+      1    0    1    0    1    0    1    0