Search code examples
rdplyrfrequencypercentage

Is there a function to get multiple frequency tables within a table in R


I would like to get multiple frequency tables within a single table.

Here is my data:

df<-read.table(text=" group score   night   ticket  book    gender  course
A   Y   1   0   0   Male    M
A   Y   1   0   0   Female  N
A   N   1   1   1   Female  N
A   Y   2   1   1   Female  M
A   Y   2   1   1   Male    N
A   Y   2   0   0   Female  N
A   N   3   1   0   Male    N
B   N   3   1   1   Female  N
B   N   1   0   1   Female  M
B   Y   1   0   1   Female  M

",header=TRUE)

and the output would be :

    Frequency   Percent
Group       
A   7   70
B   3   30
score       
Y   4   40
N   6   60
night       
1   5   50
2   3   30
3   2   20
book        
0   4   40
1   6   60
gender      
Female  7   70
Male    3   30
course      
M   4   40
N   6   60

I have used the following codes:

df%>%
group_by( group, score, night, ticket, book, gender, course) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)

But it did not work for.


Solution

  • A general solution would be to apply the table function on each column of your frame. Typically table returns a named-vector, but you want a more frame-like presentation, so we'll augment that with as.data.frame.table.

    lst2 <- lapply(df, function(x) {
      out <- as.data.frame.table(table(x))
      out$Pct <- 100*out$Freq/sum(out$Freq)
      out
    })
    # or code-golf:
    # lapply(df, function(x) transform(as.data.frame.table(table(x)), Pct = 100*Freq/sum(Freq)))
    lst2
    # $group
    #   x Freq Pct
    # 1 A    7  70
    # 2 B    3  30
    # $score
    #   x Freq Pct
    # 1 N    4  40
    # 2 Y    6  60
    # $night
    #   x Freq Pct
    # 1 1    5  50
    # 2 2    3  30
    # 3 3    2  20
    # $ticket
    #   x Freq Pct
    # 1 0    5  50
    # 2 1    5  50
    # $book
    #   x Freq Pct
    # 1 0    4  40
    # 2 1    6  60
    # $gender
    #        x Freq Pct
    # 1 Female    7  70
    # 2   Male    3  30
    # $course
    #   x Freq Pct
    # 1 M    4  40
    # 2 N    6  60
    

    You can combine all of these elements with something like:

    do.call(rbind, c(Map(cbind, nm=names(lst2), lst2), list(make.row.names = FALSE)))
    #        nm      x Freq Pct
    # 1   group      A    7  70
    # 2   group      B    3  30
    # 3   score      N    4  40
    # 4   score      Y    6  60
    # 5   night      1    5  50
    # 6   night      2    3  30
    # 7   night      3    2  20
    # 8  ticket      0    5  50
    # 9  ticket      1    5  50
    # 10   book      0    4  40
    # 11   book      1    6  60
    # 12 gender Female    7  70
    # 13 gender   Male    3  30
    # 14 course      M    4  40
    # 15 course      N    6  60
    

    Edited to remove the row names by default.