Consider the following data:
probs <- seq(0,0.3, by=0.001)
targets <- sapply(probs, function(p){
sample(c(0,1),size=1,prob = c(1-p,p))
})
Using loess
I can then graph the smoothed "targets" values as estimates of the probabilities:
require(magrittr)
loess(targets~probs,span=0.3) %>% predict %>% {plot(. ~ probs)}
However, I am not able to do that using lowess
, despite whatever f
value is chosen:
lowess(x = probs, y = targets, f = 0.01) %>% with(plot(y ~ x))
My questions: why do the results differ? Is there any way to achieve the same output from lowess
that would match the loess
one?
As per numerous threads on SO, it would seem that for univariate cases loess and lowess should match.
Unrelated side note: why won't I use loess
then? The goal is to understand the differences between lowess
and loess
. Furthermore, I would like to reapply the results using Python's statsmodels
, which, to my knowledge, provide only lowess
.
It's easier to generate your random sample using rbinom
:
probs <- seq(0, 0.3, by = 0.001)
set.seed(1)
targets <- rbinom(301, 1, probs)
The loess
smooth looks like this:
est_loess <- loess(targets ~ probs, span = 0.3) |> predict()
plot(probs, est_loess, type = "l")
If you want a similar result from lowess
, try setting iter
to 0:
est_lowess <- lowess(x = probs, y = targets, f = 0.2, iter = 0)
plot(est_lowess, type = "l")
In either case, be very careful when smoothing probabilities like this. You run the risk of having nonsensical values outside the 0-1 range. Where possible, you should convert to odds, smooth these, then convert back to probabilities. One way to achieve this is to use gam
with family = "binomial"
library(mgcv)
est_gam <- gam(targets ~ s(probs, k = 100, m = 1), gamma = 0.9,
family = binomial) |>
predict(type = "response")
plot(probs, est_gam, type = "l", ylim = c(0, 0.3))
Created on 2023-09-06 with reprex v2.0.2