Search code examples
rfor-loopuniquedistinctdifference

all.different function in R, apply it in a loop?


I have a vector of numbers. For instance like this, with only a few unique values:

set.seed(2)
a = rpois(1000, 0.3)
head(a, 20)
#### [1] 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 2 0 0 0

Now what I need is to find for each number, if itself, the previous and the following element are all different. I tried to find a function all.different or all distinct in R, unsuccessfully, so I made it:

all.diff = function(num) NROW(unique(num))==NROW(num)

Then I came up with a for loop like this:

ConsecutiveDifferent = function(vector) {
  output = numeric(NROW(vector)-2)
  for (i in 2:(NROW(vector)-1) ) {
    trio <- c(vector[i-1], vector[i], vector[i+1])
    if ( all.diff(trio) ) output[i]<-1
  }
  return(output)
}
res = ConsecutiveDifferent(a)
head(res, 20)
#### [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0

It does the job but since my vector has a length of several hundred millions, I was wondering if there was a better way to do this than a loop.

Thanks,

EDIT

Thanks for getting me so many solutions! I coulnd't decide whose answer has to be accepted so I did a microbenckmark (length=50000) and the prize goes to Franck.. Also thanks for the extensive answer. enter image description here


Solution

  • rle. This is very particular to the case of trios:

    w = with(rle(a), cumsum(lengths)[ 
      lengths == 1L & c(NA, values[-length(values)]) != c(values[-1], NA)
    ])
    
    res2    = c(NA, logical(length(a)-2), NA)
    res2[w] = TRUE
    
    identical(res, res2) # TRUE
    

    combn. I might do

    a_shift    = list(c(NA, a[-length(a)]), a, c(a[-1], NA))
    n_distinct = rowSums(combn(a_shift, 2, FUN = function(x) x[[1]] != x[[2]]))
    res        = n_distinct == length(a_shift)
    

    To examine whether it worked...

    head(cbind.data.frame(a, res), 20)
    
    
       a   res
    1  0    NA
    2  0 FALSE
    3  0 FALSE
    4  0 FALSE
    5  1 FALSE
    6  1 FALSE
    7  0 FALSE
    8  1 FALSE
    9  0 FALSE
    10 0 FALSE
    11 0 FALSE
    12 0 FALSE
    13 1 FALSE
    14 0 FALSE
    15 0 FALSE
    16 1  TRUE
    17 2  TRUE
    18 0 FALSE
    19 0 FALSE
    20 0 FALSE
    

    This can be extended to looking further ahead and behind by extending a_shift, which can be easily done with the shift function from data.table:

    library(data.table)
    n_back = 1
    n_fwd  = 1
    a_shift = setDT(list(a))[, c(
      shift(V1, n_back:0, type="lag"), 
      list(shift(V1, n_fwd, type="lead"))
    )]
    a_shift[, r := .I]
    
    resDT = melt(a_shift, id = "r")[, .(res = 
      if (any(is.na(value))) NA else uniqueN(value) == n_fwd + n_back + 1L
    ), by=r][, a := a]
    
     identical(res, resDT$res) # TRUE
    

    ... which may look arcane, but that's more to do with my coding style than the package.