Search code examples
rvectorindexingpositiondistance

Closest subsequent index for a specified value


Consider a vector:

int = c(1, 1, 0, 5, 2, 0, 0, 2)

I'd like to get the closest subsequent index (not the difference) for a specified value. The first parameter of the function should be the vector, while the second should be the value one wants to see the closest subsequent elements.

For instance,

f(int, 0)
# [1] 2 1 0 2 1 0 0 NA

Here, the first element of the vector (1) is two positions away from the first subsequent 0, (3 - 1 = 2), so it should return 2. Then the second element is 1 position away from a 0 (2 - 1 = 1). When there is no subsequent values that match the specified value, return NA (here it's the case for the last element, because no subsequent value is 0).

Other examples:

f(int, 1)
# [1] 0 0 NA NA NA NA NA NA

f(int, 2) 
# [1] 4 3 2 1 0 2 1 0

f(int, 3) 
# [1] NA NA NA NA NA NA NA NA

This should also work for character vectors:

char = c("A", "B", "C", "A", "A")

f(char, "A") 
# [1] 0 2 1 0 0

Solution

  • Find the location of each value (numeric or character)

    int = c(1, 1, 0, 5, 2, 0, 0, 2)
    value = 0
    idx = which(int == value)
    ## [1] 3 6 7
    

    Expand the index to indicate the nearest value of interest, using an NA after the last value in int.

    nearest = rep(NA, length(int))
    nearest[1:max(idx)] = rep(idx, diff(c(0, idx))),
    ## [1]  3  3  3  6  6  6  7 NA
    

    Use simple arithmetic to find the difference between the index of the current value and the index of the nearest value

    abs(seq_along(int) - nearest)
    ## [1]  2  1  0  2  1  0  0 NA
    

    Written as a function

    f <- function(x, value) {
        idx = which(x == value)
        nearest = rep(NA, length(x))
        if (length(idx)) # non-NA values only if `value` in `x`
            nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
        abs(seq_along(x) - nearest)
    }
    

    We have

    > f(int, 0)
    [1]  2  1  0  2  1  0  0 NA
    > f(int, 1)
    [1]  0  0 NA NA NA NA NA NA
    > f(int, 2)
    [1] 4 3 2 1 0 2 1 0
    > f(char, "A")
    [1] 0 2 1 0 0
    > f(char, "B")
    [1]  1  0 NA NA NA
    > f(char, "C")
    [1]  2  1  0 NA NA
    

    The solution doesn't involve recursion or R-level loops, so should e fast even for long vectors.