Consider the following vector (or dataframe or datatable)
a = data.frame(x = c(2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1))
x
represents a 'state', either 1 or 2. The vector data represents spatial data and I am looking to get the average length per state. In other words, we see, for fixed state = 2
there are two segments: 2, 2, 2, 2
and 2, 2, 2
with lengths 4
and 3
. Thus the 'avg' length of this state is (4 + 3)/2 = 3.5
.
My actual dataset has states from 1- 9 and has over 1,000,000 points in the vector. My difficulty is really 'breaking' up the vector and counting the segments. I am working with R but pseudocode would be fine.
Note: if anyone can come up with a better title, please let me know or submit an edit.
You can solve this with a combination of ?rle
and ?tapply
.
rle
counts the number of consecutive elements and stores them in lengths
and the corresponding values in values
. tapply
is used to calculate the groupwise mean
:
r <- rle(a$x)
tapply(r$lengths, INDEX=r$values, FUN=mean)
# 1 2
# 3.5 3.5