Search code examples
rggplot2regressionlatticestat

extract equation used to get the best fit using Lattice (panel.smoother) or ggplot


I have 100+ files with similar data following almost the same trend. I have managed to obtained best fits for all of them but now i want to compare that to a theoretical argument. To put in other words, I would like to generate one equation for the best fit curves I have generated using the experimental data; the equation would work for any random values in a specific range and produce a similar curve as before of course with a few errors.

Code:

set.seed(42)
x <-sort(round(runif(10,0,53)))   ## random x values
y <- runif(10,0,400)              ## random y values
data1 <-  data.frame(y=y,x=x)     ## creating a data frame

Now I either use lattice like below:

library(lattice)
library(latticeExtra)
xyplot(y ~ x,data=data1,par.settings = ggplot2like(),
                   panel = function(x,y,...){
                     panel.xyplot(x,y,...)
                   })+ layer(panel.smoother(y ~ x, se = FALSE, span = 0.5))

Or ggplot as follows:

library(ggplot2)
ggplot(data1, aes(x=x, y=y)) + geom_point() + geom_smooth(se = FALSE)

enter image description here

I would just like to know its equation or may be just a few parameters of the curve (coefficients, standard error values, etc.)


Solution

  • Smoothers generally are more complex than you appear to understand. They are usually only defined locally, so there is no global equation as there might be with a polynomial fit. The panel.smoother function uses the loess smoother by default and there is no equation in the object returned from your call to xyplot. Rather there is a call to the panel.smoother function kept in the lay node of the panel-node:

     myplot <- xyplot(y ~ x,data=data1,par.settings = ggplot2like(),
                   panel = function(x,y,...){
                     panel.xyplot(x,y,...)
                   })+ layer(panel.smoother(y ~ x, se = FALSE, span = 0.5))
     get('lay', envir = environment(myplot$panel))
    #-------------
    [[1]]
    expression(panel.smoother(y ~ x, se = FALSE, span = 0.5))
    attr(,"under")
    [1] FALSE
    attr(,"superpose")
    [1] FALSE
    
    attr(,"class")
    [1] "layer"   "trellis"
    

    This shows you what is produced when that expression gets evaluated:

    mysmooth <- loess(y~x)
    str(mysmooth)
    #--------
    List of 17
     $ n        : int 10
     $ fitted   : num [1:10] 176 312 275 261 261 ...
     $ residuals: Named num [1:10] 6.78 -24.43 98.8 -159.25 -75.9 ...
      ..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
     ----------- omitting remaider of output------------
    

    I used the xyplot-smoother because trying to find code details inside a ggplot function-result is even more complex than is that task applied to a lattice-object. Moral of this story: If you want a function of particular complexity and definable characteristic, then use a suitable spline function, such as spline or psspline in survival or rcs in rms.