Definition of intervals with loess method

Usually I have difficulties to understand how loess interpolation works, in particular the definition of intervals. Often I am not able to understand why they do not appear. As far as I know it is needed at least 5 data points but in some case even with more than 5 I am not able to see them.

I report below the last example I have:

# Dataframe 1 creation
temperatura <- c(120, 130, 140, 150, 160, 120, 130, 140, 150, 160)
viscosita <- c(65000, 45000, 29500, 20500, 15500, 65000, 45000, 29500, 20500, 15500)
df <- data.frame(Temperatura = temperatura, Viscosita = viscosita)

library(ggplot2)

Graph1 <- ggplot(data=df, aes(x=`Temperatura`, y=`Viscosita`)) +
geom_point(aes(color = "red"), shape = 1, size = 3.0, stroke = 1.5) + 
geom_smooth(aes(level=0.99, span = 0.1)) 

Graph1

But if I change slightly one point, I get them:

# Dataframe 2 creation
temperatura <- c(120, 130, 140, 150, 160, 120, 130, 140, 150, 160)
viscosita <- c(65000, 45000, 29500, 20500, 15500, 65000, 43500, 29500, 20500, 15500)
df <- data.frame(Temperatura = temperatura, Viscosita = viscosita)

library(ggplot2)

Graph2 <- ggplot(data=df, aes(x=`Temperatura`, y=`Viscosita`)) +
geom_point(aes(color = "red"), shape = 1, size = 3.0, stroke = 1.5) + 
geom_smooth(aes(level=0.99, span = 0.1)) 

Graph2

I would like better to understand how it works. I have tried to change parameters n (number of window-points where it make the loess calculation), level (% of confidence interval), span, but I was not able to get the bands on Graph1.

Solution

If you wish to supply arguments to the loess method in geom_smooth, you need to put them in a list and pass them to the method.args argument.

However, the main issue you have with the confidence intervals is just that you are essentially getting a perfect fit with the default degree-2 polynomial:

loess(viscosita ~ temperatura, data = df)
#> Call:
#> loess(formula = viscosita ~ temperatura, data = df)
#>
#> Number of Observations: 10 
#> Equivalent Number of Parameters: 4.53 
#> Residual Standard Error: 1.41e-11

The standard errors are so small that you cannot see them in a plot:

predict(loess(viscosita ~ temperatura, data = df), se = TRUE)
#> $fit
#>     1     2     3     4     5     6     7     8     9    10 
#> 65000 45000 29500 20500 15500 65000 45000 29500 20500 15500 
#> 
#> $se.fit
#>            1            2            3            4            5            6 
#> 9.967537e-12 9.967537e-12 9.967537e-12 9.967537e-12 9.967537e-12 9.967537e-12 
#>            7            8            9           10 
#> 9.967537e-12 9.967537e-12 9.967537e-12 9.967537e-12 
#> 
#> $residual.scale
#> [1] 1.409623e-11
#> 
#> $df
#> [1] 4.80231

So the plot is showing the standard errors, they are just several orders of magnitude smaller than a pixel wide. If you change to, say, a degree-1 fit, then you would see confidence intervals, but the fit isn't as good:

ggplot(data = df, aes(x = Temperatura, y = Viscosita)) +
  geom_point(aes(color = "red"), shape = 1, size = 3.0, stroke = 1.5) + 
  geom_smooth(method = 'loess', method.args = list(degree = 1))

I think though that ultimately loess may be the wrong choice for drawing a smooth curve here. If you have a theoretical model to which you are trying to fit your data, you could use non-linear least squares. It looks as though you are modelling how viscosity in a liquid changes with temperature, in which case

$Viscosity = A * e^{B/Temperature}$

Where A and B are constants is a commonly used model¹.

To do this, we need to get nls to produce standard errors, which requires the add-on package investr and a couple of helper functions:

nls_se <- function(formula, data, start, type = "confidence", ...) {
  mod <- nls(formula, data, start)
  class(mod) <- "nls_se"
  attr(mod, "type") <- type
  mod
}

predict.nls_se <- function(model, newdata, level = 0.9, interval, ...) {
  class(model) <- "nls"
  p <- investr::predFit(model, newdata = newdata, 
                        interval = attr(model, "type"), level = level)
  list(fit = p, se.fit = p[,3] - p[,1])
}

This now allows:

mod <- nls(Viscosita ~ A * exp(B/Temperatura), 
           data = df, start = list(A = 1000, B = 200))

ggplot(data = df, aes(x = Temperatura, y = Viscosita)) +
  geom_point(aes(color = "red"), shape = 1, size = 3.0, stroke = 1.5) + 
  geom_smooth(method = nls_se, formula = y ~ A * exp(B/x),
              method.args = list(start = list(A = 1000, B = 200))) +
  annotate('text', x = 140, y = 60000, 
           label = bquote(Viscosita == .(round(coef(mod)[1], 2)) * 
                    e^{.(round(coef(mod)[2], 2)) / Temperatura}),
           check_overlap = TRUE)

Which is nice because it gives you both the standard error on your plot and a formula for the best fitting line based on a theoretical model.