Search code examples
rpseudocode

calculating the average length of split vector


Consider the following vector (or dataframe or datatable)

a = data.frame(x = c(2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1))

x represents a 'state', either 1 or 2. The vector data represents spatial data and I am looking to get the average length per state. In other words, we see, for fixed state = 2 there are two segments: 2, 2, 2, 2 and 2, 2, 2 with lengths 4 and 3. Thus the 'avg' length of this state is (4 + 3)/2 = 3.5.

My actual dataset has states from 1- 9 and has over 1,000,000 points in the vector. My difficulty is really 'breaking' up the vector and counting the segments. I am working with R but pseudocode would be fine.

Note: if anyone can come up with a better title, please let me know or submit an edit.


Solution

  • You can solve this with a combination of ?rle and ?tapply. rle counts the number of consecutive elements and stores them in lengths and the corresponding values in values. tapply is used to calculate the groupwise mean:

    r <- rle(a$x)
    tapply(r$lengths, INDEX=r$values, FUN=mean)
    #   1   2 
    # 3.5 3.5