Assuming that I have a dataframe such as
x <- round(runif(1000,-5,5), 2)
y <- round(runif(1000,0,5), 2)
z <- sprintf("%s%05d", "A", seq.int(1000))
df <- data.frame(x, y, z)
How can I find which data point (names of the point from column z) is an outlier of a non-linear threshold that looks like this
y = a/(|x|-c)
where a
and c
are values that I can arbitrary chose?
|x| is the modulus of x
As mentioned in the comment, you can create a short function for this:
find_outliers = function(df, a, c){
y_threshold = a/(abs(df$x)-c)
return(df$z[df$y>y_threshold])
}
a=1
c=0.1
find_outliers(df,a,c)