I have a kernel function like so:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10)
print(kernel$y)
If I try to predict at a point outside of the range of x values, it will give me NaN
, because it is attempting to extrapolate beyond the data:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, x.points=c(130))
print(kernel$y)
> print(kernel$y)
[1] NA
Even when I change range.x
it doesn't budge:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, range.x=c(1,200) , x.points=c(130))
print(kernel$y)
> print(kernel$y)
[1] NA
How do I get the ksmooth
function the extrapolate beyond the data? I know this is a bad idea in theory, but in practice this issue comes up all the time.
To answer your side question, looking at the code of ksmooth
, range.x
is only used when x.points
is not provided so that explains why you do not see it used. Let's look at the code in ksmooth
:
function (x, y, kernel = c("box", "normal"), bandwidth = 0.5,
range.x = range(x), n.points = max(100L, length(x)), x.points)
{
if (missing(y) || is.null(y))
stop("numeric y must be supplied.\nFor density estimation use density()")
kernel <- match.arg(kernel)
krn <- switch(kernel, box = 1L, normal = 2L)
x.points <- if (missing(x.points))
seq.int(range.x[1L], range.x[2L], length.out = n.points)
else {
n.points <- length(x.points)
sort(x.points)
}
ord <- order(x)
.Call(C_ksmooth, x[ord], y[ord], x.points, krn, bandwidth)
}
From this we see that we need to not provide x.points
to make sure that range.x
is used. If you run:
x <- 1:100
y <- rnorm(100, mean=(x/2000)^2)
plot(x,y)
kernel <- ksmooth(x,y, kernel="normal", bandwidth=10, range.x=c(1,200))
plot(kernel$x, kernel$y)
Now you'll see that your kernel is evaluated beyond 100 (although not up to 200). Increasing the bandwidth parameter allows you to get even further away from 100.