Usually I have difficulties to understand how loess
interpolation works, in particular the definition of intervals. Often I am not able to understand why they do not appear.
As far as I know it is needed at least 5 data points but in some case even with more than 5 I am not able to see them.
I report below the last example I have:
# Dataframe 1 creation
temperatura <- c(120, 130, 140, 150, 160, 120, 130, 140, 150, 160)
viscosita <- c(65000, 45000, 29500, 20500, 15500, 65000, 45000, 29500, 20500, 15500)
df <- data.frame(Temperatura = temperatura, Viscosita = viscosita)
library(ggplot2)
Graph1 <- ggplot(data=df, aes(x=`Temperatura`, y=`Viscosita`)) +
geom_point(aes(color = "red"), shape = 1, size = 3.0, stroke = 1.5) +
geom_smooth(aes(level=0.99, span = 0.1))
Graph1
But if I change slightly one point, I get them:
# Dataframe 2 creation
temperatura <- c(120, 130, 140, 150, 160, 120, 130, 140, 150, 160)
viscosita <- c(65000, 45000, 29500, 20500, 15500, 65000, 43500, 29500, 20500, 15500)
df <- data.frame(Temperatura = temperatura, Viscosita = viscosita)
library(ggplot2)
Graph2 <- ggplot(data=df, aes(x=`Temperatura`, y=`Viscosita`)) +
geom_point(aes(color = "red"), shape = 1, size = 3.0, stroke = 1.5) +
geom_smooth(aes(level=0.99, span = 0.1))
Graph2
I would like better to understand how it works.
I have tried to change parameters n (number of window-points where it make the loess calculation), level (% of confidence interval), span, but I was not able to get the bands on Graph1
.
If you wish to supply arguments to the loess
method in geom_smooth
, you need to put them in a list and pass them to the method.args
argument.
However, the main issue you have with the confidence intervals is just that you are essentially getting a perfect fit with the default degree-2 polynomial:
loess(viscosita ~ temperatura, data = df)
#> Call:
#> loess(formula = viscosita ~ temperatura, data = df)
#>
#> Number of Observations: 10
#> Equivalent Number of Parameters: 4.53
#> Residual Standard Error: 1.41e-11
The standard errors are so small that you cannot see them in a plot:
predict(loess(viscosita ~ temperatura, data = df), se = TRUE)
#> $fit
#> 1 2 3 4 5 6 7 8 9 10
#> 65000 45000 29500 20500 15500 65000 45000 29500 20500 15500
#>
#> $se.fit
#> 1 2 3 4 5 6
#> 9.967537e-12 9.967537e-12 9.967537e-12 9.967537e-12 9.967537e-12 9.967537e-12
#> 7 8 9 10
#> 9.967537e-12 9.967537e-12 9.967537e-12 9.967537e-12
#>
#> $residual.scale
#> [1] 1.409623e-11
#>
#> $df
#> [1] 4.80231
So the plot is showing the standard errors, they are just several orders of magnitude smaller than a pixel wide. If you change to, say, a degree-1 fit, then you would see confidence intervals, but the fit isn't as good:
ggplot(data = df, aes(x = Temperatura, y = Viscosita)) +
geom_point(aes(color = "red"), shape = 1, size = 3.0, stroke = 1.5) +
geom_smooth(method = 'loess', method.args = list(degree = 1))
I think though that ultimately loess
may be the wrong choice for drawing a smooth curve here. If you have a theoretical model to which you are trying to fit your data, you could use non-linear least squares. It looks as though you are modelling how viscosity in a liquid changes with temperature, in which case
Where A and B are constants is a commonly used model1.
To do this, we need to get nls
to produce standard errors, which requires the add-on package investr
and a couple of helper functions:
nls_se <- function(formula, data, start, type = "confidence", ...) {
mod <- nls(formula, data, start)
class(mod) <- "nls_se"
attr(mod, "type") <- type
mod
}
predict.nls_se <- function(model, newdata, level = 0.9, interval, ...) {
class(model) <- "nls"
p <- investr::predFit(model, newdata = newdata,
interval = attr(model, "type"), level = level)
list(fit = p, se.fit = p[,3] - p[,1])
}
This now allows:
mod <- nls(Viscosita ~ A * exp(B/Temperatura),
data = df, start = list(A = 1000, B = 200))
ggplot(data = df, aes(x = Temperatura, y = Viscosita)) +
geom_point(aes(color = "red"), shape = 1, size = 3.0, stroke = 1.5) +
geom_smooth(method = nls_se, formula = y ~ A * exp(B/x),
method.args = list(start = list(A = 1000, B = 200))) +
annotate('text', x = 140, y = 60000,
label = bquote(Viscosita == .(round(coef(mod)[1], 2)) *
e^{.(round(coef(mod)[2], 2)) / Temperatura}),
check_overlap = TRUE)
Which is nice because it gives you both the standard error on your plot and a formula for the best fitting line based on a theoretical model.