Search code examples
rmatlabprogramming-languagesdslinstructions

Function/instruction to count number of times a value has already been seen


I'm trying to identify if MATLAB or R has a function that resembles the following.

Say I have an input vector v.

v = [1, 3, 1, 2, 4, 2, 1, 3]

I want to generate a vector, w of equivalent length to v. Each element w[i] should tell me the following: for the corresponding value v[i], how many times has this value been encountered so far in v, i.e. in all elements of v up to, but not including, position i. In this example

w = [0, 0, 1, 0, 0, 1, 2, 1]

I'm really looking to see if any statistical or domain-specific languages have a function/instruction like this and what it might be called.


Solution

  • In R, you can try this:

     v <- c(1,3,1,2,4,2,1,3)
     ave(v, v, FUN=seq_along)-1
     #[1] 0 0 1 0 0 1 2 1
    

    Explanation

     ave(seq_along(v), v, FUN=seq_along)  #It may be better to use `seq_along(v)` considering different classes i.e. `factor` also.
     #[1] 1 1 2 1 1 2 3 2
    

    Here, we are grouping the sequence of elements by v. For elements that match the same group, the seq_along function will create 1,2,3 etc. In the case of v, the elements of same group 1 are in positions 1,3,7, so those corresponding positions will be 1,2,3. By subtracting with 1, we will be able to start from 0.

    To understand it better,

     lst1 <- split(v,v)
     lst2 <- lapply(lst1, seq_along)
     unsplit(lst2, v)
     #[1] 1 1 2 1 1 2 3 2
    

    Using data.table

      library(data.table)
      DT <- data.table(v, ind=seq_along(v))
      DT[, n:=(1:.N)-1, by=v][,n[ind]]
      #[1] 0 0 1 0 0 1 2 1