I want to create a large proportion table that involves filtering out certain values based on one column and outputting the proportion of values equal to 0 and those greater than 0 in table. Here's an example of the data frame (df):
ID a b c d e f g
1 1 1 2 3 0 4 5 A
2 2 0 0 1 0 2 0 A
3 3 1 5 2 1 0 0 B
4 4 5 1 2 0 1 1 B
5 5 2 0 1 0 0 0 C
...
From this, I want to come up with the proportion that b=0 or b>0 IF column a>0. For your reference, I can get this information with the following code:
prop.table(table(df$b[df$a>0]!=0))*100
However, I want to do the same with columns c and d as well as e and f (same sort of pattern so that you're filtering out when c=0 and when e=0 to get those >0 and =0 proportions for d and f, respectively). Additionally, I would love to have this output all into a single table. Might look something like this:
b.perc d.perc f.perc
TRUE 75.00 20.00 66.67
FALSE 25.00 80.00 33.33
Any help is appreciated. Also, I would like to calculate the TRUE percentages across groups listed in column G, giving me an output like this:
b.perc d.perc f.perc
A 100.00 0.00 50.00
B 100.00 50.00 100.00
C 0.00 0.00 0.00
We subset the alternate columns, use each set as inputs to mapply
, get the table
and prop.table
based on the condition mentioned in the OP's post
out <- round(mapply(function(x, y) prop.table(table(x[y > 0] != 0)) * 100,
df[c(FALSE, TRUE)], df[c(TRUE, FALSE)]), 2)
colnames(out) <- paste0(colnames(out), ".perc")
out
# b.perc d.perc f.perc
#FALSE 25 80 33.33
#TRUE 75 20 66.67
If we are just interested in the TRUE percentage, then we can do this with colMeans
as well
colMeans((df[c(FALSE, TRUE)] * NA^!(df[c(TRUE, FALSE)] > 0)) != 0, na.rm = TRUE)
# b d f
#0.7500000 0.2000000 0.6666667
df <- structure(list(a = c(1L, 0L, 1L, 5L, 2L), b = c(2L, 0L, 5L, 1L,
0L), c = c(3L, 1L, 2L, 2L, 1L), d = c(0L, 0L, 1L, 0L, 0L), e = c(4L,
2L, 0L, 1L, 0L), f = c(5L, 0L, 0L, 1L, 0L)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5"))