I have a scatterplot, I want to be able to filter the data for that scatterplot.
You see four plots in this Image. 1) Middle green curve, 2) Upper Black Curve, 3) Lower Black Curve, 4) Blue Scatterplot.
I have all these in the form of the data frame:
Blue scatterplot:
df <- mtcars
geom_point(df, aes(x,y), color = 'blue')
Green curve:
geom_smooth(formula=y~x, method='loess', color='green3', se=FALSE, size=0.5)
Upper Curve:
geom_smooth(formula=y+1~x, method='loess', color='gray20', se=FALSE, size=0.5)
Lower Curve
geom_smooth(formula=y-1~x, method='loess', color='gray20', se=FALSE, size=0.5)
I want to filter the blue data points by black curve lines, such that only blue data points remaining should be in between these two black lines and outliers should have to be get removed.
I tried using which
, filter
, Subset
functions. But, it is not working, it is not rendering the output that I want.
In the end, I want the scatter data which lies between those two black lines.
I am posting a solution since this question can be helpful to others. General idea here is conditional coloring of the points. Basically, if they fall between the curves, we give them a color and otherwise color would be NA
.
Here, I assumed that we have the curves functions which we can use in our ifelse
. If that's not the case, then we need to find the best fit. You can find helpful answers about Fitting a curve to specific data in this thread.
x <- (1:10)
y <- x^4
set.seed(123)
xp <- rnorm(100, mean=5.5, sd = 4)
yp <- rnorm(100, mean=5e3, sd=5e3)
plot(x,y, type = "l")
lines(x, y+mean(y), col = "green")
lines(x, y+2*mean(y))
points(x=xp, y=yp, type = "p", col=ifelse(yp < xp^4 + 2*mean(y) & yp > xp^4, "blue", NA))