Search code examples
rdataframerle

How to count consecutive values across columns


In R, I have a dataframe that looks like this:

        2021    2020    2019    2018    2017    2015    2010    2006    2002    1998    1994    1990
1       6       6       6       6       4       6       6       6       6       6       6       6
2       6       6       6       6       6       6       6       6       6       3       4       4
3       7       6       6       6       4       6       6       6       6       6       6       6
4       6       6       6       6       6       6       6       6       6       6       6       6
5       4       4       7       6       4       6       6       6       6       6       4       6
6       6       6       6       6       6       4       6       6       6       2       6       6
...

What I want to do is count how many consecutive values there are from the first column (including the first column), and put the result in a new dataframe called 'age'. So the desired output from the above example would look like this:

        age
1       4
2       9
3       1
4       12
5       2
6       5
...

I've been playing around with rle, but have had no luck so far. Any ideas?

EDIT to add reproducible data:

df <- structure(list(LC_2021 = c(6, 6, 6, 6, 6, 6), LC_2020 = c(6, 
6, 6, 6, 4, 6), LC_2019 = c(6, 6, 6, 6, 6, 6), LC_2018 = c(6, 
6, 6, 6, 6, 6), LC_2017 = c(6, 6, 6, 6, 6, 6), LC_2015 = c(6, 
6, 6, 6, 6, 6), LC_2010 = c(6, 6, 6, 6, 6, 6), LC_2006 = c(4, 
4, 4, 6, 4, 4), LC_2002 = c(6, 6, 6, 6, 6, 6), LC_1998 = c(6, 
6, 6, 6, 6, 6), LC_1994 = c(6, 6, 6, 6, 6, 6), LC_1990 = c(6, 
6, 6, 6, 6, 6)), row.names = c(NA, 6L), class = "data.frame")

Solution

  • You can use apply to iterate through the rows, then extract the lengths from rle.

    data.frame(age = apply(df, 1, \(x) rle(x)$lengths[[1]]))
    
      age
    1   4
    2   9
    3   1
    4  12
    5   2
    6   5
    

    Data

    df <- structure(list(X2021 = c(6L, 6L, 7L, 6L, 4L, 6L), X2020 = c(6L, 
    6L, 6L, 6L, 4L, 6L), X2019 = c(6L, 6L, 6L, 6L, 7L, 6L), X2018 = c(6L, 
    6L, 6L, 6L, 6L, 6L), X2017 = c(4L, 6L, 4L, 6L, 4L, 6L), X2015 = c(6L, 
    6L, 6L, 6L, 6L, 4L), X2010 = c(6L, 6L, 6L, 6L, 6L, 6L), X2006 = c(6L, 
    6L, 6L, 6L, 6L, 6L), X2002 = c(6L, 6L, 6L, 6L, 6L, 6L), X1998 = c(6L, 
    3L, 6L, 6L, 6L, 2L), X1994 = c(6L, 4L, 6L, 6L, 4L, 6L), X1990 = c(6L, 
    4L, 6L, 6L, 6L, 6L)), class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6"))