I'm working on a nonlinear regression analysis using R and have encountered an issue with error propagation. Specifically, I'm using predictNLS
to estimate the errors of my predictions, but in some cases, the error ranges exceed biologically plausible thresholds (e.g., prediction intervals going above 1 or below 0, which doesn't make sense in my biological context).
Below is a reproducible example of my code. It simulates an exponential relationship, fits a nonlinear least squares (NLS) model, and then uses predictNLS
to estimate prediction errors. My challenge is handling instances where the error propagation results in biologically impossible values.
Reproducible Example:
# simulate exponential relationship
set.seed(123)
# generate random x values between 0 and 60
x <- runif(100, 0, 60)
y <- 1 - exp(-0.075 * x) * rnorm(100, 0.7, 0.1)
data = data.frame(sr= x, fipar = y)
# create a nls model to fit the data
model <- nls(fipar ~ 1 - exp(-a * sr), data = data, start = list(a = 0.001))
# create an observed and predicted dataframe
data$predicted <- predict(model, data)
library(ggplot2)
data %>%
ggplot(aes(x = sr, y = fipar)) +
geom_point() +
geom_line(aes(y = predicted), color = "red")
# estimate the errors using predictNLS
newdat = data.frame(sr = seq(1, 60, 1))
prediction_se <- predictNLS(model, newdata = newdat, interval = "prediction", type = 'response')
prediction_se$summary$sr <- newdat$sr
prediction_se$summary %>%
ggplot(aes(x = sr, y = Prop.Mean.1)) +
ylim(0, 1.2) +
geom_point() +
geom_ribbon(aes(ymin = `Prop.2.5%`, ymax = `Prop.97.5%`), alpha = 0.2) +
geom_hline(yintercept = 1)
My natural reaction to this would be to simply replace any error value > 1 with a 1, but I'm guessing this overlooks a bunch of statistical assumptions. Are there any recommended approaches to constrain the prediction intervals or adjust the error estimation to reflect biological constraints?Perhaps my question would be better off in the stats stack exchange, but I figure I'll look for an answer here too..
A glm using a logit link function worked:
# Fit the GLM model
model <- glm(fipar ~ sr,
data = data,
family = binomial(link = "logit"))
# Create an observed and predicted dataframe
data$predicted <- predict(model, data, type = "response",se.fit = T)$fit
data$se <- predict(model, data, type = "response",se.fit = T)$se.fit
# Plot the data and the model's predicted probabilities
library(ggplot2)
ggplot(data, aes(x = sr, y = fipar)) +
geom_point() +
geom_line(aes(y = predicted), color = "red")+
geom_ribbon(aes(ymin = predicted - se, ymax = predicted + se), alpha = 0.2)