Search code examples
rdrc

simple curve fitting in R


I am trying to find a fit for my data. But so far had no luck. Tried the logarithmic, different ones from the drc package .. but I am sure there must be a better one I just don't know the type. On a different note - I would be grateful for advice on how to go about curve hunting in general.

library(drc)    
df<-structure(list(x = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
        20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
        36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
        52, 53, 54, 55), y = c(0.1066, -0.6204, -0.2028, 0.2621, 0.4083, 
        0.4497, 0.6343, 0.7762, 0.8809, 1.0029, 0.8089, 0.7845, 0.8009, 
        0.9319, 0.9414, 0.9505, 0.9323, 1.0321, 0.9381, 0.8975, 1.0929, 
        1.0236, 0.9589, 1.0644, 1.0411, 1.0763, 0.9679, 1.003, 1.142, 
        1.1049, 1.2868, 1.1569, 1.1952, 1.0802, 1.2125, 1.3765, 1.263, 
        1.2507, 1.2125, 1.2207, 1.2836, 1.3352, 1.1311, 1.2321, 1.4277, 
        1.1645), w = c(898, 20566, 3011, 1364, 1520, 2376, 1923, 1934, 
        1366, 1010, 380, 421, 283, 262, 227, 173, 118, 113, 95, 69, 123, 
        70, 80, 82, 68, 83, 76, 94, 101, 97, 115, 79, 98, 84, 92, 121, 
        97, 102, 93, 92, 101, 74, 124, 64, 52, 63)), row.names = c(NA, 
        -46L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`47` = 47L), class = "omit"))
    
    fit <- drm(data = df,y ~ x,fct=LL.4(), weights = w)
    plot(fit)

enter image description here


Solution

  • 1) If we ignore the weights then y = a + b * x + c/x^2 seems to fit and is linear in the coefficients so is easy to fit. This seems upward sloping so we started with a line but then we needed to dampen that so we added a reciprocal term. A reciprocal quadratic worked slightly better than a plain reciprocal based on the residual sum of squares so we switched to that.

    fm <- lm(y ~ x + I(1 / x^2), df)
    coef(summary(fm))
    ##                  Estimate   Std. Error   t value     Pr(>|t|)
    ## (Intercept)  1.053856e+00  0.116960752  9.010341 1.849238e-11
    ## x            4.863077e-03  0.002718613  1.788808 8.069195e-02
    ## I(1/x^2)    -1.460443e+02 16.518887452 -8.841049 3.160306e-11
    

    The coefficient of the x term is not significant at the 5% level -- the p value is 8% in the table above -- so we can remove it and it will fit nearly as well giving a model with only two parameters. In the plot below the fm fit with 3 parameters is solid and the fm2 fit with 2 parameters is dashed.

    fm2 <- lm(y ~ I(1 / x^2), df)
    
    plot(y ~ x, df)
    lines(fitted(fm) ~ x, df)
    lines(fitted(fm2) ~ x, df, lty = 2)
    

    screenshot

    2) Another approach is to use two straight lines. This is still continuous but has one non-differentiable point at the point of transition. The model has 4 parameters, the intercepts and slopes of each line. Below we do use the weights. It has the advantage of an obvious motivation based on the appearance of the data. The break point at the intersection of the two lines may have significance as the transition point between the higher sloping initial growth and the lower sloping subsequent growth.

    # starting values use lines fitted to 1st ten and last 10 points
    fm_1 <- lm(y ~ x, df, subset = 1:10)
    fm_2 <- lm(y ~ x, df, subset = seq(to = nrow(df), length = 10))
    st <- list(a = coef(fm_1)[[1]], b = coef(fm_1)[[2]],
               c = coef(fm_2)[[1]], d = coef(fm_2)[[2]])
    
    fm3 <- nls(y ~ pmin(a + b * x, c + d * x), df, start = st, weights = w)
    
    # point of transition
    X <- with(as.list(coef(fm3)), (a - c) / (d - b)); X
    ## [1] 16.38465
    Y <- with(as.list(coef(fm3)), a + b * X); Y
    ## [1] 0.8262229
    
    plot(y ~ x, df)
    lines(fitted(fm3) ~ x, df)
    

    screenshot