Search code examples
rdataframegroupingfrequency

Calculated column based in a subtotal


I have a dataframe with frequencies like this:

data <- data.frame(id = c("1","2","3","4","5","6","7","8"),
                   level = c("high", "high", "high", "med", "med", "med", "low", "low"),
                   n = c(24, 48, 57, 79, 2, 69, 37, 82))

What I'm trying to do is to create a new column to get the percentage of each observation but taking as reference the level subtotal. This is my solution, but trying to make it more efficient (the more values in level, the more lines to calculate subtotals):

high <- sum(data$n[data$level == "high"])
med <- sum(data$n[data$level == "med"])
low <- sum(data$n[data$level == "low"])

data$prop <- NA
data$prop[data$level == "high"] <- (data$n/high)*100
data$prop[data$level == "med"] <- (data$n/med)*100
data$prop[data$level == "low"] <- (data$n/low)*100

Solution

  • Just for laffs, here's an answer in base:

    transform(data, percent = ave(n, level, FUN = function(x) 100 * prop.table(x)))
    
    #   id level  n   percent
    # 1  1  high 24 18.604651
    # 2  2  high 48 37.209302
    # 3  3  high 57 44.186047
    # 4  4   med 79 52.666667
    # 5  5   med  2  1.333333
    # 6  6   med 69 46.000000
    # 7  7   low 37 31.092437
    # 8  8   low 82 68.907563