Search code examples
rexcelcurve-fittingvertex

How to reduce cubic equation's maximum peak (fitting)


The ventilation volume data were collected according to the efficiency. Several samples were taken and fitted into cubic equations. It was written in Excel, and a third regression equation was obtained.

Click to see picture

However, as you can see from the picture, the ventilation volume at 90-95% is higher than 100%. The data should never be higher than 100%, but the maximum vertex of the auto regression is convex so that it exceeds 100% in the form of a curve.

Is there a way to reduce the maximum vertex and fit it? Use the measured data as it is, but do not exceed 100%.

The use of R or other statistical programs is also welcome. R values ​​can be a little lower.

Thank you.


Solution

  • Here are a few ideas in R:

    First, I'm making some example data that are similar to yours and fitting a linear model with x^3, x^2, and x as predictors:

    #  make example data
    xx = rep(c(30, 50, 70, 100), each = 10)
    yy = 1/(1+exp(-(xx-50)/15))  * 4798.20 + rnorm(length(xx), sd = 20)
    xx = c(0, xx)
    yy = c(0, yy)
    
    # fit third-order linear model
    m0 = lm(yy ~ I(xx^3) + I(xx^2) + xx)
    
    x_to_predict = data.frame(xx = seq(0, 100, length.out = length(xx)))
    lm_preds = predict(m0, newdata = x_to_predict)
    

    Idea 1: You could fit a model that uses a sigmoid (or other monotonic) curve.

    # fit quasibinomial model for proportion
    # first scale response variable between 0 and 1
    m1 = glm(I(yy/max(yy)) ~ xx , family = quasibinomial())
    
    # predict
    preds_glm = predict(m1, 
                    newdata = x_to_predict, 
                    type = "response")
    

    Idea 2: Fit a generalized additive model that will make a smooth curve.

    # fit Generalized Additive Model
    library(mgcv)
    # you have to tune "k" somewhat -- larger means more "wiggliness"
    m2 = gam(yy ~ s(xx, k = 4)) 
    gam_preds = predict(m2, 
                        newdata = x_to_predict, 
            type = "response")
    

    Here's what the plots for each model look like:

    # plot data and predictions
    plot(xx, yy, ylab = "result", xlab = "efficiency")
    lines(x_to_predict$xx, 
          preds_glm*max(yy), "l", col = 'red', lwd = 2)
    lines(x_to_predict$xx, 
          gam_preds, "l", col = 'blue', lwd = 2)
    lines(x_to_predict$xx, lm_preds, 
          "l", col = 'black', lwd = 2, lty = 2)
    legend("bottomright", 
           lty = c(0, 1, 1, 2), 
           legend = c("data", "GLM prediction", "GAM prediction", "third-order lm"), 
           pch = c(1, NA_integer_, NA_integer_, NA_integer_), 
           col = c("black", "red", "blue", "black"))
    

    plot of three different models