Search code examples
rdataframesum

Divide each value by the sum of values by group


I have the following data frame, where "x" is a grouping variable and "y" some values:

dat <- data.frame(x = c(1, 2, 3, 3, 2, 1), y = c(3, 4, 4, 5, 2, 5))

I want to create a new column where each "y" value is divided by the sum of "y" within each group defined by "x". E.g. the result for the first row is 3 / (3 + 5) = 0.375, where the denominator is the sum of "y" values for group 1 (x = 1).


Solution

  • There are various ways of solving this, here's one

    with(dat, ave(y, x, FUN = function(x) x/sum(x)))
    ## [1] 0.3750000 0.6666667 0.4444444 0.5555556 0.3333333 0.6250000
    

    Here's another possibility

    library(data.table)
    setDT(dat)[, z := y/sum(y), by = x]
    dat
    #    x y         z
    # 1: 1 3 0.3750000
    # 2: 2 4 0.6666667
    # 3: 3 4 0.4444444
    # 4: 3 5 0.5555556
    # 5: 2 2 0.3333333
    # 6: 1 5 0.6250000
    

    Here's a third one

    library(dplyr)
    dat %>%
      group_by(x) %>%
      mutate(z = y/sum(y))
    
    # Source: local data frame [6 x 3]
    # Groups: x
    # 
    #   x y         z
    # 1 1 3 0.3750000
    # 2 2 4 0.6666667
    # 3 3 4 0.4444444
    # 4 3 5 0.5555556
    # 5 2 2 0.3333333
    # 6 1 5 0.6250000