I'd like to count the numbers of TRUE
of a variable for each category of another variable. Or more precisely the average number of TRUE
for each category. Afterward, I'd like to create a vector.
The code I use works fine, when the variable, like Var2
contains, both TRUE
and FALSE
values. However when like Var3
the variable only contains FALSE
, my code creates a vector with the value of NULL. However I would like it to create a vector with the value of 0 for each category.
Any ideas?
Here is an example:
df <- data.frame(
Var1 = c("A", "B", "A", "B", "A", "C"),
Var2 = c(TRUE, FALSE, TRUE, FALSE, TRUE, TRUE),
Var3 = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)
)
#creating some data
count1 = as.vector(tapply(df$Var1, df$Var2, table)$'TRUE')/nrow(df)
#this works fine
count2 = as.vector(tapply(df$Var1, df$Var3, table)$'TRUE')/nrow(df)
#this not. It creates only a NULL-vector
count1
works as inteded, but count2
is a NULL-vector.
Thanks!
Here is how to dynamically build it in base R without having to explicitly name the variables.
i1 <- as.formula(paste("cbind(",
paste(setdiff(names(df), "Var1"), collapse = ", "),
") ~ Var1"))
res <- aggregate(i1, df, \(i) sum(i) / nrow(df))
res
# Var1 Var2 Var3
#1 A 0.5000000 0
#2 B 0.0000000 0
#3 C 0.1666667 0