Search code examples
rdplyrplyrtidyrexpss

Conditional Cross tabulation in R


Looking for the quickest way to achieve below task using "expss" package.

With a great package of "expss", we can easily do cross tabulation (which has other advantage and useful functions for cross-tabulations.), we can cross-tabulate multiple variables easily like below.

 #install.packages("expss")

 library("expss")
 data(mtcars)


  var1 <- "vs, am, gear, carb"
  var_names = trimws(unlist(strsplit(var1, split = ","))) 


  mtcars %>%
    tab_prepend_values %>%
    tab_cols(total(), ..[(var_names)]) %>%
    tab_cells(cyl) %>%
    tab_stat_cpct() %>%
    tab_pivot()

Above gives an output as: (column %)

                      #Total    vs          am          gear            carb                        
                                0     1     0     1     3     4   5     1   2    3   4    6    8 

  cyl             4    34.4   5.6  71.4  15.8  61.5   6.7  66.7  40  71.4  60                    
                  6    21.9  16.7  28.6  21.1  23.1  13.3  33.3  20  28.6           40  100      
                  8    43.8  77.8        63.2  15.4  80.0        40        40  100  60       100 
       #Total cases    32.0  18.0  14.0  19.0  13.0  15.0  12.0   5   7.0  10    3  10    1    1 

However, looking for an approach to create a table like below:

 CYL    |  VS = 0   |  AM = 1   |   Gear = 4 or Gear = 5    |  Carb (All)
   4        5.56        61.54               58.82                34.38
   6        16.67       23.08               29.41                21.88
   8        77.78       15.38               11.76                43.75

Total(col%) 100.00      100.00              100.00               100.00

Though i can achive this using dplyr and join functions but that is too complex incase we have to pass variables in runtime or dynamically.

Any help will be appriciable. Thanks!!


Solution

  • You may try this:

    1) Making a function which can create proportion out of the sum.

    myprop_tbl <- function(x){
        return(round(x*100/sum(x),2))
    }
    

    2) Using purrr's map, applying the function on your data frame and then binding the result.

    library(tidyverse)
    tab <- mtcars %>% 
        group_by(cyl) %>% 
        summarise(vs_sum = sum(vs==0), am_sum = sum(am==1), 
                  gear_sum = sum(gear == 4|gear==5), carb_sum= n())
    
    finaltab <- bind_cols(tab[,1],map_df(tab[,2:length(tab)], myprop_tbl))
    

    Output:

    # * cyl vs_sum am_sum gear_sum carb_sum
    #  <dbl>  <dbl>  <dbl>    <dbl>    <dbl>
    #1  4.00   5.56   61.5     58.8     34.4
    #2  6.00  16.7    23.1     29.4     21.9
    #3  8.00  77.8    15.4     11.8     43.8**
    

    EDIT:

    After had a discussion with OP, it seems he also wanted to pass string of functions,

    I am using here a package seplyr

    tab <- mtcars %>% 
        group_by(cyl) %>% 
        summarise_se(c("vs_sum = sum(vs==0)",
                  "am_sum = sum(am==1)",
                  "gear_sum = sum(gear == 4|gear==5)", 
                  "carb_sum = n()"))
    

    It works also, but weired names you will get, to fix that you can do this:

    This works perfectly as original answer which I have posted:

    tab <- mtcars %>% 
        group_by(cyl) %>% 
        summarise_se(c("vs_sum" := "sum(vs==0)",
                  "am_sum" := "sum(am==1)",
                  "gear_sum" := "sum(gear == 4|gear==5)", 
                  "carb_sum" := "n()"))
    

    You may read this here @ this link