Consider the following data:
library(dplyr)
df <- structure(list(Total = c(3450, 1728, 122, 5300),
A1 = c(1092, 497, 4, 1593),
A2 = c(596, 156, 29, 781),
A3 = c(801, 417, 36, 1254),
A4 = c(107, 11, 0, 118),
A5 = c(614, 217, 21, 852),
A6 = c(132, 47, 0, 179),
A7 = c(108, 383, 32, 523)),
.Names = c("Total", paste0("A", 1:7)),
row.names = c(paste0("B", 1:3), "Total"),
class = "data.frame")
Total A1 A2 A3 A4 A5 A6 A7
B1 3450 1092 596 801 107 614 132 108
B2 1728 497 156 417 11 217 47 383
B3 122 4 29 36 0 21 0 32
Total 5300 1593 781 1254 118 852 179 523
The problem is as follows: letting the row Total
be the denominator, calculate the percentage that each cell in the same column is of the row Total
.
In this case, I would do
df_names <- rownames(df)
df2 <- apply(
round(
sweep(df,
STATS = as.matrix(df[rownames(df) == "Total",]),
MARGIN = 2,
FUN = "/") * 100,
digits = 2),
MARGIN = 2,
FUN = paste0,
"%")
rownames(df2) <- df_names
df2 <- as.data.frame(df2)
Total A1 A2 A3 A4 A5 A6 A7
B1 65.09% 68.55% 76.31% 63.88% 90.68% 72.07% 73.74% 20.65%
B2 32.6% 31.2% 19.97% 33.25% 9.32% 25.47% 26.26% 73.23%
B3 2.3% 0.25% 3.71% 2.87% 0% 2.46% 0% 6.12%
Total 100% 100% 100% 100% 100% 100% 100% 100%
but, although this works, I prefer using %>%
:
df3 <- sweep(df,
STATS = as.matrix(df[rownames(df) == "Total",]),
MARGIN = 2,
FUN = "/") * 100 %>%
round(digits = 2) %>%
apply(., MARGIN = 2, FUN = paste0, "%")
Error in apply(., MARGIN = 2, FUN = paste0, "%") :
dim(X) must have a positive length
Why does the method above using %>%
not work?
The issue is the trailing numeric 100 and how it enter the pipe. Surround with ()
and apply will work fine, or for a less involved solution:
(sweep(df,
STATS = as.matrix(df[rownames(df) == "Total",]),
MARGIN = 2,
FUN = "/") * 100) %>% round(digits = 2) %>%
mutate_all(function(x)sprintf('%s%%', x))
Total A1 A2 A3 A4 A5 A6 A7
1 65.09% 68.55% 76.31% 63.88% 90.68% 72.07% 73.74% 20.65%
2 32.6% 31.2% 19.97% 33.25% 9.32% 25.47% 26.26% 73.23%
3 2.3% 0.25% 3.71% 2.87% 0% 2.46% 0% 6.12%
4 100% 100% 100% 100% 100% 100% 100% 100%
The %>%
takes two args, and generates an environment. The arguments are lhs
and rhs
with left-hand-side being input values
sweep(df,
STATS = as.matrix(df[rownames(df) == "Total",]),
MARGIN = 2,
FUN = "/") * 100 %>% print
[1] 100
Total A1 A2 A3 A4 A5 A6 A7
B1 65.094340 68.5499058 76.312420 63.875598 90.677966 72.065728 73.74302 20.650096
B2 32.603774 31.1989956 19.974392 33.253589 9.322034 25.469484 26.25698 73.231358
B3 2.301887 0.2510986 3.713188 2.870813 0.000000 2.464789 0.00000 6.118547
Total 100.000000 100.0000000 100.000000 100.000000 100.000000 100.000000 100.00000 100.000000
Now with parens
(sweep(df,
STATS = as.matrix(df[rownames(df) == "Total",]),
MARGIN = 2,
FUN = "/") * 100) %>% print
Total A1 A2 A3 A4 A5 A6 A7
B1 65.094340 68.5499058 76.312420 63.875598 90.677966 72.065728 73.74302 20.650096
B2 32.603774 31.1989956 19.974392 33.253589 9.322034 25.469484 26.25698 73.231358
B3 2.301887 0.2510986 3.713188 2.870813 0.000000 2.464789 0.00000 6.118547
Total 100.000000 100.0000000 100.000000 100.000000 100.000000 100.000000 100.00000 100.000000
Brackets also work
sweep(df,
STATS = as.matrix(df[rownames(df) == "Total",]),
MARGIN = 2,
FUN = "/") %>% {
round(. * 100, 2) %>% mutate_all(function(x)sprintf('%s%%',x))
}
Total A1 A2 A3 A4 A5 A6 A7
1 65.09% 68.55% 76.31% 63.88% 90.68% 72.07% 73.74% 20.65%
2 32.6% 31.2% 19.97% 33.25% 9.32% 25.47% 26.26% 73.23%
3 2.3% 0.25% 3.71% 2.87% 0% 2.46% 0% 6.12%
4 100% 100% 100% 100% 100% 100% 100% 100%