Search code examples
rcurve-fittingmodel-fitting

How to estimate the best fitting function to a scatter plot in R?


I have scatterplot of two variables, for instance this:

x<-c(0.108,0.111,0.113,0.116,0.118,0.121,0.123,0.126,0.128,0.131,0.133,0.136)

y<-c(-6.908,-6.620,-5.681,-5.165,-4.690,-4.646,-3.979,-3.755,-3.564,-3.558,-3.272,-3.073)

and I would like to find the function that better fits the relation between these two variables.

to be precise I would like to compare the fitting of three models: linear, exponential and logarithmic.

I was thinking about fitting each function to my values, calculate the likelihoods in each case and compare the AIC values.

But I don't really know how or where to start. Any possible help about this would be extremely appreciated.

Thank you very much in advance.

Tina.


Solution

  • Here is an example of comparing five models. Due to the form of the first two models we are able to use lm to get good starting values. (Note that models using different transforms of y should not be compared so we should not use lm1 and lm2 as comparison models but only for starting values.) Now run an nls for each of the first two. After these two models we try polynomials of various degrees in x. Fortunately lm and nls use consistent AIC definitions (although its not necessarily true that other R model fitting functions have consistent AIC definitions) so we can just use lm for the polynomials. Finally we plot the data and fits of the first two models.

    The lower the AIC the better so nls1 is best followed by lm3.2 following by nls2 .

    lm1 <- lm(1/y ~ x)
    nls1 <- nls(y ~ 1/(a + b*x), start = setNames(coef(lm1), c("a", "b")))
    AIC(nls1) # -2.390924
    
    lm2 <- lm(1/y ~ log(x))
    nls2 <- nls(y ~ 1/(a + b*log(x)), start = setNames(coef(lm2), c("a", "b")))
    AIC(nls2) # -1.29101
    
    lm3.1 <- lm(y ~ x) 
    AIC(lm3.1) # 13.43161
    
    lm3.2 <- lm(y ~ poly(x, 2))
    AIC(lm3.2) # -1.525982
    
    lm3.3 <- lm(y ~ poly(x, 3))
    AIC(lm3.3) # 0.1498972
    
    plot(y ~ x)
    
    lines(fitted(nls1) ~ x, lty = 1) # solid line
    lines(fitted(nls2) ~ x, lty = 2) # dashed line
    

    enter image description here

    ADDED a few more models and subsequently fixed them up and changed notation. Also to follow up on Ben Bolker's comment we can replace AIC everywhere above with AICc from the AICcmodavg package.