Search code examples
rindexing

Find index of two identical values in succession for the first time


These are some exemple vectors to reproduce:

a <- c(14,26,38,64,96,127,152,152,152,152,152,152)
b <- c(4,7,9,13,13,13,13,13,13,13,13,13,13,13)
c <- c(62,297,297,297,297,297,297,297,297,297,297,297)

It is obvious that at some point a certain value is repeated until the end. I need to get exactly the index where this values appears for the first time.

So in this case the output would be 7,4,2, since in a 152 starts at the 7th position, in b 13 starts at the 4th position and in c 297 starts at the 2nd position. I hope this is clear.

Anybody with a hint how to get this automatically?

Edit: the data is always increasing and once it starts repeating it continues until the end. In this kind of analysis there will always be a repetition at least at the last two values.


Solution

  • You could use rle() to take the run-length encoding of every value except the final one and sum their lengths:

    get_index  <- \(x) sum(head(rle(x)$lengths, -1)) + 1
    sapply(list(a, b, c), get_index)
    # [1] 7 4 2
    

    Rcpp solution

    If your vectors are really long and the last value is only repeated towards the end, you don't need to check the length of every run, so the above will be inefficient. It's better to start from the end of the vector and work backwards until you find a different value:

    Rcpp::cppFunction('
    int get_index2(NumericVector x) {
        int n = x.size();
        double last_value = x[n - 1];
        for (int i = n - 2; i >= 0; --i) {
            if (x[i] != last_value) {
                return i + 2; // +1 as it is next element; +1 for 1-indexing
            }
        }
        return 1; // all elements are the same
    }
    ')
    
    sapply(list(a,b,c), get_index2)
    # [1] 7 4 2
    

    data.table solution

    Given your update to the question, another way to approach this would be:

    sapply(list(a,b,c), data.table::uniqueN)
    # [1] 7 4 2
    

    This is not conceptually different from the nice answer by zx8754 and with vectors of this size is unlikely to be meaningfully different in speed and could even be slower. However, it is faster for very large vectors.