How to reduce cubic equation's maximum peak (fitting)

The ventilation volume data were collected according to the efficiency. Several samples were taken and fitted into cubic equations. It was written in Excel, and a third regression equation was obtained.

However, as you can see from the picture, the ventilation volume at 90-95% is higher than 100%. The data should never be higher than 100%, but the maximum vertex of the auto regression is convex so that it exceeds 100% in the form of a curve.

Is there a way to reduce the maximum vertex and fit it? Use the measured data as it is, but do not exceed 100%.

The use of R or other statistical programs is also welcome. R values can be a little lower.

Thank you.

Solution

Here are a few ideas in R:

First, I'm making some example data that are similar to yours and fitting a linear model with x^3, x^2, and x as predictors:

#  make example data
xx = rep(c(30, 50, 70, 100), each = 10)
yy = 1/(1+exp(-(xx-50)/15))  * 4798.20 + rnorm(length(xx), sd = 20)
xx = c(0, xx)
yy = c(0, yy)

# fit third-order linear model
m0 = lm(yy ~ I(xx^3) + I(xx^2) + xx)

x_to_predict = data.frame(xx = seq(0, 100, length.out = length(xx)))
lm_preds = predict(m0, newdata = x_to_predict)

Idea 1: You could fit a model that uses a sigmoid (or other monotonic) curve.

# fit quasibinomial model for proportion
# first scale response variable between 0 and 1
m1 = glm(I(yy/max(yy)) ~ xx , family = quasibinomial())

# predict
preds_glm = predict(m1, 
                newdata = x_to_predict, 
                type = "response")

Idea 2: Fit a generalized additive model that will make a smooth curve.

# fit Generalized Additive Model
library(mgcv)
# you have to tune "k" somewhat -- larger means more "wiggliness"
m2 = gam(yy ~ s(xx, k = 4)) 
gam_preds = predict(m2, 
                    newdata = x_to_predict, 
        type = "response")

Here's what the plots for each model look like:

# plot data and predictions
plot(xx, yy, ylab = "result", xlab = "efficiency")
lines(x_to_predict$xx, 
      preds_glm*max(yy), "l", col = 'red', lwd = 2)
lines(x_to_predict$xx, 
      gam_preds, "l", col = 'blue', lwd = 2)
lines(x_to_predict$xx, lm_preds, 
      "l", col = 'black', lwd = 2, lty = 2)
legend("bottomright", 
       lty = c(0, 1, 1, 2), 
       legend = c("data", "GLM prediction", "GAM prediction", "third-order lm"), 
       pch = c(1, NA_integer_, NA_integer_, NA_integer_), 
       col = c("black", "red", "blue", "black"))