Search code examples
rsplitgroup

R: splitting dataframe into distinct subgroups containing sequence of groups


This question is similar to one already answered: R: Splitting dataframe into subgroups consisting of every consecutive 2 groups

However, rather than splitting into subgroups that have a type in common, I need to split into subgroups that contain two consecutive types and are distinct. The groups in my actual data have differing numbers of rows as well.

df <- data.frame(ID=c('1','1','1','1','1','1','1'), Type=c('a','a','b','c','c','d','d'), value=c(10,2,5,3,7,3,9))

   ID Type value
1  1    a    10
2  1    a     2
3  1    b     5
4  1    c     3
5  1    c     7
6  1    d     3
7  1    d     9

So subgroup 1 would be Type a and b:

   ID Type value
1  1    a    10
2  1    a     2
3  1    b     5

And subgroup 2 would be Type c and d:

   ID Type value
4  1    c     3
5  1    c     7
6  1    d     3
7  1    d     9

I have tried manipulating the code from this previous example, but I can't figure out how to make this happen without having overlapping Types in each group. Any help would be greatly appreciated - thanks!

EDIT: thanks for pointing out I didn't actually include the correct link.


Solution

  • Here is a rle way, written as a function. Pass the data.frame and the split column name as a character string.

    df <- data.frame(ID=c('1','1','1','1','1','1','1'), 
                     Type=c('a','a','b','c','c','d','d'), 
                     value=c(10,2,5,3,7,3,9))
    
    split_two <- function(x, col) {
      r <- rle(x[[col]])
      r$values[c(FALSE, TRUE)] <- r$values[c(TRUE, FALSE)]
      split(x, inverse.rle(r))
    }
    split_two(df, "Type")
    #> $a
    #>   ID Type value
    #> 1  1    a    10
    #> 2  1    a     2
    #> 3  1    b     5
    #> 
    #> $c
    #>   ID Type value
    #> 4  1    c     3
    #> 5  1    c     7
    #> 6  1    d     3
    #> 7  1    d     9
    

    Created on 2023-02-09 with reprex v2.0.2