Search code examples
rcumulative-sum

R cumulative sum of one variable while another variable tracks "cutoff"


What I'd like to do is find the sum of the sepal lengths of iris flowers that are over each "critical value" of petal width / petal length.

Consider the following code

library(tidyverse)
data("iris")       
 iris <- iris %>% 
      mutate(prop_width_length = Petal.Width/Petal.Length)
    
    prop_width_length <-  as.data.frame(iris$prop_width_length)
    
    portion = as.data.frame(seq(0,1,0.001))
    cumsum = NULL
    
    
    for (i in 1:1001) {
      cumsum[i] = sum(prop_width_length >= portion[i,1])
    }

sigportion <-  cbind(portion, cumsum)

That gives me a cumulative sum of how many of my iris flowers have a width/length proportion greater than or equal to each "critical value". Then finally it puts it in a data frame so I can make a nice ggplot. Basically it counts how many flowers have over each "critical value".

What I'd like to in addition to the above code is add up all of the sepal lengths for every iris where their petal width/length ratio is greater than or equal to each "critical value" stored my portion variable.

so something like

sum all the sepal lengths of iris flowers which have petal width/length >= critvalue

Solution

  • It becomes quite easy with data.table

    library(data.table)
    iris<-as.data.table(iris)
    iris[,prop_width_length := Petal.Width/Petal.Length]
    portion<-as.data.table(seq(from = 0,to = 1,by = 0.001))
    cumsum<-vector()
    for(i in 1:nrow(portion)){
      cumsum[i]<-iris[prop_width_length >= portion[[1]][i],sum(Sepal.Length)]
    }
    sigportion<-cbind(portion,cumsum)
    

    Hope that helps!