Search code examples
rdata-cleaning

how to get point set (x,y) in a desired area in r


enter image description here

The figure is the plot of x,y set in a excel file, total 8760 pair of x and y. I want to remove the noise data pair in red circle area and output a new excel file with remain data pair. How could I do it in R?


Solution

  • Using @G5W's example:

    Make up data:

    set.seed(2017)
    x = runif(8760, 0,16)
    y = c(abs(rnorm(8000, 0, 1)), runif(760,0,8)) 
    XY = data.frame(x,y)
    

    Fit a quantile regression to the 90th percentile:

    library(quantreg)
    library(splines)
    qq <- rq(y~ns(x,20),tau=0.9,data=XY)
    

    Compute and draw the predicted curve:

    xvec <- seq(0,16,length.out=101)
    pp <- predict(qq,newdata=data.frame(x=xvec))
    plot(y~x,data=XY)
    lines(xvec,pp,col=2,lwd=2)
    

    enter image description here

    Keep only points below the predicted line:

    XY2 <- subset(XY,y<predict(qq,newdata=data.frame(x)))
    
    plot(y~x,data=XY2)
    lines(xvec,pp,col=2,lwd=2)
    

    enter image description here

    You can make the line less wiggly by lowering the number of knots, e.g. y~ns(x,10)