Search code examples
rvector

Efficient way to apply a grouping vector in R


Suppose I have some grouping information in the form of a grouping vector:

group = c(1,1,2,2,3,3,3)

So this is saying I have three groups: two groups of size 2 and one group of size 3. Now suppose I have a vector (I've added random numbers)

x = c(1.5, 3.1, 5.4, -4.5, 2.2, 4.4, 1.1)

Is there an efficient way in R to loop over this vector and applying certain functions within the group?

For example, summation within each group, using a for loop would be:

sums = rep(0,3)
for (i in 1:3){
grp_ids = which(group == i)
sums[i] = sum(x[grp_ids])
}

Is there an easier way to do this?


Solution

  • You can use group_by from {dplyr}:

    library(dplyr)
    
    group = c(1, 1, 2, 2, 3, 3, 3)
    x = c(1.5, 3.1, 5.4, -4.5, 2.2, 4.4, 1.1)
    
    df <- data.frame(group, x)
    
    result <- df %>%
      group_by(group) %>%
      summarize(sums = sum(x))
    
    > print(result)
    # A tibble: 3 × 2
      group  sums
      <dbl> <dbl>
    1     1   4.6
    2     2   0.9
    3     3   7.7