Search code examples
rstatisticspredictnls

Handling Error Propagation Above Biological Thresholds in R with predictNLS


I'm working on a nonlinear regression analysis using R and have encountered an issue with error propagation. Specifically, I'm using predictNLS to estimate the errors of my predictions, but in some cases, the error ranges exceed biologically plausible thresholds (e.g., prediction intervals going above 1 or below 0, which doesn't make sense in my biological context).

Below is a reproducible example of my code. It simulates an exponential relationship, fits a nonlinear least squares (NLS) model, and then uses predictNLS to estimate prediction errors. My challenge is handling instances where the error propagation results in biologically impossible values.

Reproducible Example:

# simulate exponential relationship
set.seed(123)
# generate random x values between 0 and 60
x <- runif(100, 0, 60)
y <- 1 - exp(-0.075 * x) * rnorm(100, 0.7, 0.1)

data = data.frame(sr= x, fipar = y)

# create a nls model to fit the data
model <- nls(fipar ~ 1 - exp(-a * sr), data = data, start = list(a = 0.001))

# create an observed and predicted dataframe
data$predicted <- predict(model, data)

library(ggplot2)
data %>%
  ggplot(aes(x = sr, y = fipar)) + 
  geom_point() +
  geom_line(aes(y = predicted), color = "red")

# estimate the errors using predictNLS
newdat = data.frame(sr = seq(1, 60, 1))

prediction_se <- predictNLS(model, newdata = newdat, interval = "prediction", type = 'response')

prediction_se$summary$sr <- newdat$sr

prediction_se$summary %>%
  ggplot(aes(x = sr, y = Prop.Mean.1)) + 
  ylim(0, 1.2) +
  geom_point() +
  geom_ribbon(aes(ymin = `Prop.2.5%`, ymax = `Prop.97.5%`), alpha = 0.2) +
  geom_hline(yintercept = 1)

A scatter plot in ggplot2 showing relationship between model input and output, with 95% confidence interval band. A horizontal line represents the point at which any predicted value or error, would be going outside what is biologically possible

My natural reaction to this would be to simply replace any error value > 1 with a 1, but I'm guessing this overlooks a bunch of statistical assumptions. Are there any recommended approaches to constrain the prediction intervals or adjust the error estimation to reflect biological constraints?Perhaps my question would be better off in the stats stack exchange, but I figure I'll look for an answer here too..


Solution

  • A glm using a logit link function worked:

    # Fit the GLM model 
    model <- glm(fipar ~ sr, 
                 data = data, 
                 family = binomial(link = "logit"))
    
    # Create an observed and predicted dataframe
    data$predicted <- predict(model, data, type = "response",se.fit = T)$fit
    data$se <- predict(model, data, type = "response",se.fit = T)$se.fit
    
    # Plot the data and the model's predicted probabilities
    library(ggplot2)
    ggplot(data, aes(x = sr, y = fipar)) + 
      geom_point() +
      geom_line(aes(y = predicted), color = "red")+
      geom_ribbon(aes(ymin = predicted - se, ymax = predicted + se), alpha = 0.2)
    

    enter image description here