Let's say I have a data.frame of the following structure:
fac1 fac2 fac3 val
1 Apple Fresh Red 2
2 Apple Old Red 3
3 Apple Hazard Red 1
4 Banana Fresh Yellow 4
5 Banana Old Yellow 5
6 Banana Hazard Yellow 1
7 Berry Fresh Purple 1
8 Berry Old Purple 1
9 Berry Hazard Purple 3
I want to sum up val
of those rows where fac2
equals Fresh
or Old
for each factor level of fac1
and come up with such a data frame:
fac1 fac3 sum.freshold
1 Apple Red 5
2 Banana Yellow 9
3 Berry Purple 2
Moreover, I want to specify/notate the factor levels in the condition that fac2
should equal Fresh
or Old
by their characters (i.e., "Fresh"
and "Old"
), not by the underlying integers that represent these factor levels (i.e., 1
and 2
).
> str(mydf$fac2)
Factor w/ 3 levels "Fresh","Hazard",..: 1 3 2 1 3 2 1 3 2
Here is the example:
mydf <- data.frame(fac1 = c(rep("Apple", 3), rep("Banana", 3), rep("Berry", 3)),
fac2 = rep(c("Fresh", "Old", "Hazard"), 3),
fac3 = c(rep("Red", 3), rep("Yellow", 3), rep("Purple", 3)),
val = c(2,3,1,4,5,1,1,1,3),
stringsAsFactors = T)
One attempt of mine which is not working and won't create a data.frame either:
tapply(mydf$val, mydf$fac1, function(x) {x[mydf$fac2 == "Fresh"] + x[mydf$fac2 == "Old"]})
The tidyverse::dplyr
solution:
library(tidyverse)
# --------------------
mydf %>%
filter(fac2 %in% c("Fresh", "Old")) %>%
summarise(
.by = c(fac1, fac3),
sum.freshold = sum(val))
Output:
fac1 fac3 sum.freshold
1 Apple Red 5
2 Banana Yellow 9
3 Berry Purple 2