Search code examples
rfilterdplyrapproximatenear

Filter a vector based on approximate multiple values


Suppose I want to filter vector x based on which values appear approximately in vector y:

x <- c(1.123456789, 2.123456789, 3.123456789)
y <- c(1.12345, 2.12345)

If I didn't want approximate comparison, I'd use %in%:

x %in% y
[1] FALSE FALSE FALSE

Where my required result is:

# something like: x %near_in% y
[1] TRUE TRUE FALSE

The dplyr::near(x, y, tol) function's help file mentions "x, y: Numeric vectors to compare", but this is of course not entirely true, y has to be either of x's length or a single value, because all near() does is use the abs() function:

near <- function (x, y, tol = .Machine$double.eps^0.5) 
{
    abs(x - y) < tol
}

And if we do this we see abs() takes y's values and repeats them until it needs so (not without a warning), and we get:

abs(x - y)
[1] 0.000006789 0.000006789 2.000006789
Warning message:
In x - y : longer object length is not a multiple of shorter object length

My current solution is to use sapply() on y's elements to create a n x m matrix (3 x 2 here), then use apply() to see if any() of the rows (values of x) has TRUE in it:

apply(sapply(y, function(y_val) near(x, y_val, 0.0001)), 1, any)
[1] TRUE TRUE FALSE

But this seems cumbersome! What if I had thousands of values in y, wouldn't I be creating a temporary matrix with thousands of rows? Any better way?


Solution

  • You could floor or round the values:

    tol <- 1e-5
    floor(x/tol)
    #> [1] 112345 212345 312345
    floor(y/tol)
    #> [1] 112345 212345
    floor(x/tol) %in% floor(y/tol)
    #> [1]  TRUE  TRUE FALSE