Search code examples
rsmoothing

Smooth a binary variable using moving average or kernel smoothing


I have data of the form:

x      y
0      0
0.01   1
0.03   0
0.04   1
0.04   0

x is continuous from 0 to 1 and not equally spaced and y is binary.

I'd like to smooth y over the x-axis using R, but can't find the right package. The kernel smoothing functions I've found produce density estimates of x or will give the wrong estimate at the ends of the x because they'll average over regions less than 0 and greater than 1.

I'd also like to avoid linear smoothers like Loess givens then binary form of y. The moving average functions I've seen assume equally-spaced x-values.

Do you know of any R functions that will smooth and ideally have a bandwidth selection procedure? I can write a moving average function and cross-validate to determine the bandwidth, but I'd prefer to find an existing function that's been vetted.


Solution

  • I would suggest using something like

    d <- data.frame(x,y) ## not absolutely necessary but good practice
    library(mgcv)
    m1 <- gam(y~s(x),family="binomial",data=d)
    

    This will (1) respect the binary nature of the data (2) do automatic degree-of-smoothness ("bandwidth" in your terminology) selection, using generalized cross-validation.

    Use

    plot(y~x, data=d)
    pp <- data.frame(x=seq(0,1,length=101))
    pp$y <- predict(m1,newdata=pp,type="response")
    with(pp,lines(x,y))
    

    or

    library(ggplot2)
    ggplot(d,aes(x,y))+geom_smooth(method="gam",family=binomial)
    

    to get predictions/plot the results.

    (I hope your real data set has more than 5 observations ... otherwise this will fail ...)