Search code examples
rdataframecalculated-columns

Running equations by categories along a column in r


I currently have a dataframe that looks like this

tree cookie height radius
C1T1   A     0.37   12.3
c1t1   B     0.65   14.2
C1T1   C     0.91   16
C1T2   A     0.2    4
C1T2   B     0.5    10
C1T2   C     0.75   12.4

I would like to add a "volume" column to this dataframe. The equation that would be in the volume column is: (1/3) * pi * height * (radius1^2 + radius2^2 + (radius1*radius2)) (this is the volume of a frustum!). For each tree I would like to run this equation where height is the height of the cookie plus the heights that came before it (so for tree C1T1 cookie C the height would be 0.91+0.65+0.37) and radius1 is its own radius, radius2 would be the radius of the cookie that comes before it (so again for C1T1 cookie C, radius2 would be the radius of cookie C1T1 B). Additionally for the first "cookie" of each tree - since it has no previous height I would not need to add it to anything, and for the radius2 it could be its own radius used again, so it would be the same value for radius1 and radius2). Any suggestions as to how to do this would be greatly appreciated!


Solution

  • library(tidyverse)
    
    df <- tribble(
      ~tree, ~cookie, ~height, ~radius1,
      "C1T1", "A", 0.37, 12.3,
      "C1T1", "B", 0.65, 14.2,
      "C1T1", "C", 0.91, 16,
      "C1T2", "A", 0.2, 4,
      "C1T2", "B", 0.5, 10,
      "C1T2", "C", 0.75, 12.4
    )
    
    df <- df %>%
      group_by(tree) %>%
      # Sort by height just to be safe
      arrange(tree, height) %>%
      mutate(
        cumheight = cumsum(height),
        radius2 = lag(radius1),
        radius2 = if_else(is.na(radius2), radius1, radius2),
        volume = 1/3 * pi * cumheight * (radius1^2 + radius2^2 + (radius1 * radius2)))