The ventilation volume data were collected according to the efficiency. Several samples were taken and fitted into cubic equations. It was written in Excel, and a third regression equation was obtained.
However, as you can see from the picture, the ventilation volume at 90-95% is higher than 100%. The data should never be higher than 100%, but the maximum vertex of the auto regression is convex so that it exceeds 100% in the form of a curve.
Is there a way to reduce the maximum vertex and fit it? Use the measured data as it is, but do not exceed 100%.
The use of R or other statistical programs is also welcome. R values can be a little lower.
Thank you.
Here are a few ideas in R:
First, I'm making some example data that are similar to yours and fitting a linear model with x^3, x^2, and x as predictors:
# make example data
xx = rep(c(30, 50, 70, 100), each = 10)
yy = 1/(1+exp(-(xx-50)/15)) * 4798.20 + rnorm(length(xx), sd = 20)
xx = c(0, xx)
yy = c(0, yy)
# fit third-order linear model
m0 = lm(yy ~ I(xx^3) + I(xx^2) + xx)
x_to_predict = data.frame(xx = seq(0, 100, length.out = length(xx)))
lm_preds = predict(m0, newdata = x_to_predict)
Idea 1: You could fit a model that uses a sigmoid (or other monotonic) curve.
# fit quasibinomial model for proportion
# first scale response variable between 0 and 1
m1 = glm(I(yy/max(yy)) ~ xx , family = quasibinomial())
# predict
preds_glm = predict(m1,
newdata = x_to_predict,
type = "response")
Idea 2: Fit a generalized additive model that will make a smooth curve.
# fit Generalized Additive Model
library(mgcv)
# you have to tune "k" somewhat -- larger means more "wiggliness"
m2 = gam(yy ~ s(xx, k = 4))
gam_preds = predict(m2,
newdata = x_to_predict,
type = "response")
Here's what the plots for each model look like:
# plot data and predictions
plot(xx, yy, ylab = "result", xlab = "efficiency")
lines(x_to_predict$xx,
preds_glm*max(yy), "l", col = 'red', lwd = 2)
lines(x_to_predict$xx,
gam_preds, "l", col = 'blue', lwd = 2)
lines(x_to_predict$xx, lm_preds,
"l", col = 'black', lwd = 2, lty = 2)
legend("bottomright",
lty = c(0, 1, 1, 2),
legend = c("data", "GLM prediction", "GAM prediction", "third-order lm"),
pch = c(1, NA_integer_, NA_integer_, NA_integer_),
col = c("black", "red", "blue", "black"))