Search code examples
rsummaryr-s3proportions

s3 is there a way to combine prop.table for character variables?


Noob here, I'm stuck trying to use S3 to summarise proportion data for a data.frame where there are four columns of character data. My goal is to build a summary method to show the proportions for every level of every variable at one time.

I can see how to get the propotion for each column

a50survey1 <- table(Student1995$alcohol)
a50survey2 <- table(Student1995$drugs)
a50survey3 <- table(Student1995$smoke)
a50survey4 <- table(Student1995$sport)
prop.table(a50survey1)
prop.table(a50survey1)

                  Not  Once or Twice a week          Once a month           Once a week More than once a week 
                 0.10                  0.32                  0.24                  0.28                  0.06 

But I cannot find a way to combine all of the prop.table outputs into one summary output. Unless I'm really wrong. I cannot find a S3 method like summary.prop.table which would work for me. The goal is to set up for the current data frame and then drop in new same size & observations data frames in the future.

I'm really a step by step guy and if you can help me, that would be great - thank you

Dataframe info here. There are four columns, where each column has a different number of catagorical options for obersvations.

> dput(head(Student1995,5))
structure(list(alcohol = structure(c(3L, 2L, 2L, 2L, 3L), .Label = c("Not", 
"Once or Twice a week", "Once a month", "Once a week", "More than once a week"
), class = "factor"), drugs = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("Not", 
"Tried once", "Occasional", "Regular"), class = "factor"), smoke = structure(c(2L, 
3L, 1L, 1L, 1L), .Label = c("Not", "Occasional", "Regular"), class = "factor"), 
    sport = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Not regular", 
    "Regular"), class = "factor")), row.names = c(NA, 5L), class = "data.frame")

The Summary data if it helps - edit

> summary(Student1995)
                  alcohol          drugs           smoke            sport   
 Not                  : 5   Not       :36   Not       :38   Not regular:13  
 Once or Twice a week :16   Tried once: 6   Occasional: 5   Regular    :37  
 Once a month         :12   Occasional: 7   Regular   : 7                   
 Once a week          :14   Regular   : 1                                   
 More than once a week: 3 

Solution

  • Maybe this is what you wanted. Values in each category sum up to 100%.

    lis <- sapply( Student1995, function(x) t( sapply( x, table ) ) )
    
    sapply( lis, function(x) colSums(prop.table(x)) )
    $alcohol
                      Not  Once.or.Twice.a.week          Once.a.month
                      0.0                   0.6                   0.4
              Once.a.week More.than.once.a.week
                      0.0                   0.0
    
    $drugs
           Not Tried.once Occasional    Regular
           0.8        0.2        0.0        0.0
    
    $smoke
           Not Occasional    Regular
           0.6        0.2        0.2
    
    $sport
    Not.regular     Regular
            0.4         0.6
    

    and the whole summary...

    prop.table( table(as.vector( sapply( Student1995, unlist ))) )
    
                     Not          Not regular           Occasional
                    0.35                 0.10                 0.05
            Once a month Once or Twice a week              Regular
                    0.10                 0.15                 0.20
              Tried once
                    0.05