I'm following very useful solution on creating a summary column for multiple categories. As discussed in the linked solution, I am working with a code that generates the percentage column for each subgroup.
mtcars %>%
group_by (am, gear) %>%
summarise (n=n()) %>%
mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
The code generates the desired values:
## Source: local data frame [4 x 4]
## Groups: am
##
## am gear n rel.freq
## 1 0 3 15 79%
## 2 0 4 4 21%
## 3 1 4 8 62%
## 4 1 5 5 38%
I would like modify this code to dynamically created columns pertaining to the unique categories available in the second category passed in the dplyr
call. This would be gear
in case of the attached example. So in case of the attached example, the resulting data frame would look like that:
am gear n rel.freq_gear3 rel.freq_gear4 rel.freq_gear5
1 0 3 15 79% 21%
2 1 4 8 0 62% 38%
For a small number of categories I presume that I would be able to make use of the summarising the values in conditionally
, as discussed here, where I would try to execute dplyr
statements only for specified conditions sumBfoo = sum(B[A=="foo"]))
. However, this approach would be inefficient when dealing with multiple categories. Outside dplyr
solution could be developed with use of a loop and jumping through unique values of the desired category but my desire would be to do this in dplyr
.
Broadly speaking, I would like to create a table similar to the one below:
library(gmodels)
CrossTable(mtcars$am, mtcars$gear)
Cell Contents
|-------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
| N / Col Total |
| N / Table Total |
|-------------------------|
Total Observations in Table: 32
| mtcars$gear
mtcars$am | 3 | 4 | 5 | Row Total |
-------------|-----------|-----------|-----------|-----------|
0 | 15 | 4 | 0 | 19 |
| 4.169 | 1.371 | 2.969 | |
| 0.789 | 0.211 | 0.000 | 0.594 |
| 1.000 | 0.333 | 0.000 | |
| 0.469 | 0.125 | 0.000 | |
-------------|-----------|-----------|-----------|-----------|
1 | 0 | 8 | 5 | 13 |
| 6.094 | 2.003 | 4.339 | |
| 0.000 | 0.615 | 0.385 | 0.406 |
| 0.000 | 0.667 | 1.000 | |
| 0.000 | 0.250 | 0.156 | |
-------------|-----------|-----------|-----------|-----------|
Column Total | 15 | 12 | 5 | 32 |
| 0.469 | 0.375 | 0.156 | |
-------------|-----------|-----------|-----------|-----------|
But I'm interested only in row proportions without counts and totals and other gadgets.
dplyr
Building of the comment by @docendo discimus
:
library(tidyr)
count(mtcars, am, gear) %>%
mutate(rel.freq = n/sum(n)) %>%
spread(gear, rel.freq) %>%
group_by(am) %>%
summarize_each(funs(sum2 = sum(., na.rm = TRUE))) %>%
mutate_each(funs(perc = paste0(round(100 * ., 0), "%")), -am, -n)
Produces:
Source: local data frame [2 x 5]
am n 3 4 5
(dbl) (int) (chr) (chr) (chr)
1 0 19 79% 21% 0%
2 1 13 0% 62% 38%
base
prop.table(table(mtcars$am, mtcars$gear), 1) %>%
round(2) %>%
'*'(100)
Produces:
3 4 5
0 79 21 0
1 0 62 38