I'm trying to identify if MATLAB or R has a function that resembles the following.
Say I have an input vector v
.
v = [1, 3, 1, 2, 4, 2, 1, 3]
I want to generate a vector, w
of equivalent length to v
. Each element w[i]
should tell me the following: for the corresponding value v[i]
, how many times has this value been encountered so far in v
, i.e. in all elements of v
up to, but not including, position i
. In this example
w = [0, 0, 1, 0, 0, 1, 2, 1]
I'm really looking to see if any statistical or domain-specific languages have a function/instruction like this and what it might be called.
In R
, you can try this:
v <- c(1,3,1,2,4,2,1,3)
ave(v, v, FUN=seq_along)-1
#[1] 0 0 1 0 0 1 2 1
ave(seq_along(v), v, FUN=seq_along) #It may be better to use `seq_along(v)` considering different classes i.e. `factor` also.
#[1] 1 1 2 1 1 2 3 2
Here, we are grouping the sequence of elements by v
. For elements that match the same group, the seq_along
function will create 1,2,3 etc
. In the case of v
, the elements of same group 1
are in positions 1,3,7
, so those corresponding positions will be 1,2,3
. By subtracting with 1
, we will be able to start from 0
.
To understand it better,
lst1 <- split(v,v)
lst2 <- lapply(lst1, seq_along)
unsplit(lst2, v)
#[1] 1 1 2 1 1 2 3 2
Using data.table
library(data.table)
DT <- data.table(v, ind=seq_along(v))
DT[, n:=(1:.N)-1, by=v][,n[ind]]
#[1] 0 0 1 0 0 1 2 1