Search code examples
rdataframeconditional-statementssum

Sum up the values of specific factor levels by each group in a data.frame


Let's say I have a data.frame of the following structure:

    fac1   fac2   fac3 val
1  Apple  Fresh    Red   2
2  Apple    Old    Red   3
3  Apple Hazard    Red   1
4 Banana  Fresh Yellow   4
5 Banana    Old Yellow   5
6 Banana Hazard Yellow   1
7  Berry  Fresh Purple   1
8  Berry    Old Purple   1
9  Berry Hazard Purple   3

I want to sum up val of those rows where fac2 equals Fresh or Old for each factor level of fac1 and come up with such a data frame:

    fac1    fac3 sum.freshold
1  Apple    Red             5
2 Banana Yellow             9  
3  Berry Purple             2

Moreover, I want to specify/notate the factor levels in the condition that fac2 should equal Fresh or Old by their characters (i.e., "Fresh" and "Old"), not by the underlying integers that represent these factor levels (i.e., 1 and 2).

> str(mydf$fac2)
 Factor w/ 3 levels "Fresh","Hazard",..: 1 3 2 1 3 2 1 3 2

Here is the example:

mydf <- data.frame(fac1 = c(rep("Apple", 3), rep("Banana", 3), rep("Berry", 3)), 
                   fac2 = rep(c("Fresh", "Old", "Hazard"), 3),
                   fac3 = c(rep("Red", 3), rep("Yellow", 3), rep("Purple", 3)), 
                   val = c(2,3,1,4,5,1,1,1,3), 
                   stringsAsFactors = T)

One attempt of mine which is not working and won't create a data.frame either:

tapply(mydf$val, mydf$fac1, function(x) {x[mydf$fac2 == "Fresh"] + x[mydf$fac2 == "Old"]})

Solution

  • The tidyverse::dplyr solution:

    library(tidyverse)
    
    # --------------------
    mydf %>% 
      filter(fac2 %in% c("Fresh", "Old")) %>% 
      summarise(
        .by = c(fac1, fac3), 
        sum.freshold = sum(val))
    

    Output:

        fac1   fac3 sum.freshold
    1  Apple    Red            5
    2 Banana Yellow            9
    3  Berry Purple            2