Search code examples
rlinear-regression

rolling linear regression in R to find optimal fit


I am working with a bacterial growth. There is a point at which these grow exponentially resulting in a straight line when you take the logarithmic value.

Now I want to use an algorithm to automatically detect when they exit out of exponential growth. I decided to try and write a recursive function that monitors for changes in r-squared value.

library('datasets')
library('tidyverse')
data(iris)
summary(iris)

iris %>% ggplot(aes(x=Sepal.Length, y=Petal.Length))+
  geom_point()+
  geom_smooth(method = lm)


x <- 1
rolling_func <- function(df) {
  rsquare_1 <- summary(lm(Petal.Length[x:x+5]~Sepal.Length[1:x], data = iris))[[8]]
  rsquare_2 <- summary(lm(Petal.Length[x+5:x+10]~Sepal.Length[1:x], data = iris))[[8]]
  if (rsquare_2 < rsquare_1){
    return(x)
  } else {
    x <- x+5
    rolling_func(df)
  }
}

rolling_func(iris)

However when I try to run this I get the following error

Error: C stack usage 7969828 is too close to the limit

Basically what I want my code to do is check when I no longer have a linear correlation between x and y

Thank you very much for your help.


Solution

  • Here's an approach using dplyr and a for loop

    library(dplyr)
    
    chunksize <- 3
    running <- vector(length = nrow(iris)/chunksize)
    
    for (i in 1:(nrow(iris)/chunksize)) {
       abc <- iris %>% 
          slice(1:(i * chunksize)) %>% 
          lm(Petal.Length ~ Sepal.Length, data = .)
       running[[i]] <- summary(abc)$r.squared
       if (i > 1) {
          if (running[[i]] < running[[i - 1]]) {
             break
          }
       }
       print(paste("NUmber of iris rows after which r squared goes down is",
                   i*chunksize))
    }
    
    #> [1] "NUmber of iris rows after which r squared goes down is 3"
    
    running
    
    #>  [1] 0.7500000 0.3963221 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #>  [8] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> [15] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> [22] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> [29] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> [36] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> [43] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
    #> [50] 0.0000000