Search code examples
rcrosstab

crosstabs with multiple top-levels rows/cols


I would like to generate what is essentially a many-in-one crosstabs table.

I'll explain through an example. My data has a number of survey questions, q1...q5, each with four levels of answers (strongly agree, agree, disagree, strongly disagree). I also have four demographic variables (gender, region, age group, marital status). I would like to generate a crosstabs table that shows basically each of the questions against each of the demographic variables, like so:

           gender             region                  age group                     marital status
         M    F    X     North South East West   <25  25-34  35-44 >=45     Single Married Divorced Widowed
Q1  1    N    N    N       N    N      N   N      N     N      N     N         N      N        N      N
    2    N    N    N       N    N      N   N      N     N      N     N         N      N        N      N
    3    N    N    N       N    N      N   N      N     N      N     N         N      N        N      N
    4    N    N    N       N    N      N   N      N     N      N     N         N      N        N      N
Q2  1    N    N    N       N    N      N   N      N     N      N     N         N      N        N      N
    2    N    N    N       N    N      N   N      N     N      N     N         N      N        N      N
    3    N    N    N       N    N      N   N      N     N      N     N         N      N        N      N
    4    N    N    N       N    N      N   N      N     N      N     N         N      N        N      N

etc...

each N represents a cell filled with count/percentage.

All the crosstabs functions I can find will only allow n-way tables with one variable at each level. Is there a way to have multiple variables in each level, like in my example?

If I can somehow do this within the tidyverse, that would be best, but I'm open to other solutions as well.

Thanks!


Solution

  • For future reference, here's my very inelegant solution, using the tidyverse:

    yelements<-c("Q1", "Q2", "Q3") # etc...
    xelements<-c("region","gender","age_group","martial_status")
    Rows<-NULL # a helper table to create each set of rows individually
    FullXTab<-NULL
    for (i in yelements)  { # this is a vector of names of factor columns that will form the rows of the xtabs
      for (w in xelements)  { # vector of names of factor columns that will form the columns of the xtabs
        x<-ftable(data[c(i,w)])
        x<-as.data.frame(as.matrix(x))
        names(x)<-paste(w,names(x),sep="_") # add the name of the variable to the names of each of the levels that will form individual columns to differentiate them
        names(x)<-gsub("V1","NA",names(x)) # blank items will turn into meaningless "V1" columns, so I replace that with NA
        if(is.null(Rows)) {
          x<-rownames_to_column(x,"answer") # make the y-axis factor levels into their own column
          Rows<-x
          } else {Rows<-bind_cols(Rows,x)}
      }
      Rows$Q<-i # create a column with the name of the y-axis vector, to differentiate different vectors with similar levels, e.g. question numbers
      if(is.null(FullXTab)) {FullXTab<-Rows} else {FullXTab<-bind_rows(FullXTab,Rows)}
      Rows<-NULL
    }
    

    This creates first a set of row for the first element in xelements, with a table each for each of the elements of yelements, and then binds them together to one "wide" table; and then binds each of those sets of rows into a full table. I'm sure there's a cleaner way to do this...