Search code examples
rregressionkernel-density

Kernel regression with 2 independent variables


For Nadaraya–Watson kernel regression estimate, we use the following in R:
ksmooth(x, y, kernel = c("box", "normal"), bandwidth = 0.5, range.x = range(x), n.points = max(100L, length(x)), x.points)

What am I supposed to use when I have two independent variables instead of a solo x; say, x1,x2. How do I change range.x,n.points,x.points?


Solution

  • You can perform kernel regression with multiple independent variables using npreg from the np package:

    library(np)
    
    mod <- npreg(mpg ~ wt + hp, data = mtcars)                
    
    mod
    #> 
    #> Regression Data: 32 training points, in 2 variable(s)
    #>                      wt       hp
    #> Bandwidth(s): 0.2401575 16.63531
    #> 
    #> Kernel Regression Estimator: Local-Constant
    #> Bandwidth Type: Fixed
    #> 
    #> Continuous Kernel Type: Second-Order Gaussian
    #> No. Continuous Explanatory Vars.: 2
    

    Just as kernel regression with a single independent variable can be represented as a curved line in 2D space, with two independent variables we have a curved surface in 3D space. We can show the result of this graphically by predicting over a grid of x, y locations. Here is an example showing a surface created by kernel regression that estimates the value of mpg given wt and hp for the mtcars data set:

    library(ggplot2)
    
    newdat <- expand.grid(hp = seq(50, 350, 0.1), wt = seq(1, 5.5, 0.02))
    newdat$mpg <- predict(mod, newdata = newdat)
    
    ggplot(newdat, aes(wt, hp, fill = mpg)) +
      geom_tile() +
      geom_point(data = mtcars, shape = 21, size = 3) +
      scale_fill_viridis_c()
    

    enter image description here

    Created on 2022-10-08 with reprex v2.0.2