Search code examples
rfunctionaggregatesummary

aggregate different rows by different functions in R


I have the following data frame: enter image description here

(dput() for testing bellow)

    structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "797 Fleet", class = "factor"), 
    V2 = structure(c(5L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 3L, 2L, 
    5L, 1L, 4L, 3L, 2L, 5L), .Label = c("Available Hours", "Cycle Time", 
    "Performance", "Production time", "Units"), class = "factor"), 
    V3 = c(51, 2989.601111, 2498.85, 540.8754973, 39.93337086, 
    52, 30010.73389, 24946.62833, 529.4659407, 40.81742793, 36, 
    20778.5525, 17174.18722, 535.7960907, 40.36234152, 19)), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, -16L))

i need to aggregate the data but with different functions for different features; for the 797 fleet, the units and production time should be added, but the performance an cycle time should be averaged.

I have just tried aggregate with two functions, but i get two columns one with all added and other with all averaged and i need only one column.

How can i accomplish that?


Solution

  • Here's an idea using data.table:

    library(data.table)
    fun_list <- list("Units" = sum, "Production time" = sum, "Performance" = mean, "Cycle Time" = mean)
    setDT(df)[V2 %in% names(fun_list), .(res = fun_list[[as.character(.BY[[2]])]](V3)),by = .(V1, V2)]
    
    #          V1              V2         res
    #1: 797 Fleet           Units   158.00000
    #2: 797 Fleet Production time 44619.66555
    #3: 797 Fleet     Performance   535.37918
    #4: 797 Fleet      Cycle Time    40.37105
    

    Let's unpack this solution a little. First, we store a map of the functions we want to apply to each of the values in V2. This list is simply a list of functions. E.g. "Units" = sum means we want to apply sum to the "Units" group. To see how this works try: fun_list[["Units"]](c(1,2,3)).

    We then use this in our group by operation in data.table. We use the V2 value stored in .BY to index our function list. That is for each V2 value we choose a function from our list to apply. This is accomplished by fun_list[[as.character(.BY[[2]])]] (Note we need to as.character since .BY is a factor). Lastly we apply that function to V3 which is what (V3) does in the last part of the code fun_list[[as.character(.BY[[2]])]](V3))!