Suppose I want to filter vector x
based on which values appear approximately in vector y
:
x <- c(1.123456789, 2.123456789, 3.123456789)
y <- c(1.12345, 2.12345)
If I didn't want approximate comparison, I'd use %in%
:
x %in% y
[1] FALSE FALSE FALSE
Where my required result is:
# something like: x %near_in% y
[1] TRUE TRUE FALSE
The dplyr::near(x, y, tol)
function's help file mentions "x, y: Numeric vectors to compare", but this is of course not entirely true, y
has to be either of x
's length or a single value, because all near()
does is use the abs()
function:
near <- function (x, y, tol = .Machine$double.eps^0.5)
{
abs(x - y) < tol
}
And if we do this we see abs()
takes y
's values and repeats them until it needs so (not without a warning), and we get:
abs(x - y)
[1] 0.000006789 0.000006789 2.000006789 Warning message: In x - y : longer object length is not a multiple of shorter object length
My current solution is to use sapply()
on y
's elements to create a n x m
matrix (3 x 2 here), then use apply()
to see if any()
of the rows (values of x
) has TRUE
in it:
apply(sapply(y, function(y_val) near(x, y_val, 0.0001)), 1, any)
[1] TRUE TRUE FALSE
But this seems cumbersome! What if I had thousands of values in y
, wouldn't I be creating a temporary matrix with thousands of rows? Any better way?
You could floor
or round
the values:
tol <- 1e-5
floor(x/tol)
#> [1] 112345 212345 312345
floor(y/tol)
#> [1] 112345 212345
floor(x/tol) %in% floor(y/tol)
#> [1] TRUE TRUE FALSE