Search code examples
rdata-analysisdata-cleaninglongitudinalrle

R: How to count the number of consecutive occurrences in a longitudinal database with a length condition?


I am working on R with a longitudinal database about individuals, with several rows per ID (named vn in the database) and their attributes in column. My variable observation indicates each year of observation and maritalstatus indicates whether the person is married 1 or not 0.

Here is an overview of an individual in my database:

structure(list(vn = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), maritalstatus = c(0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1), observation = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018)), class = "data.frame")

I am looking for a way to create a new variable that counts the number of consecutive occurrences only the first time their length is greater or equal to 5. For this example it would be:

marital_length = c (0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0)

My current code (below) creates a variable that counts the maximum length of consecutive numbers but I didn't find a way to add a condition to count only the first time the length is >= 5.


maritalstatus_consecutive <- tapply(test$maritalstatus, INDEX = test$vn, most_consecutive_val)```

test$marital_length <- maritalstatus_consecutive[test$vn]

I also tried to use min() (instead of max) but for instance if a person is married 2 years, divorced, then married 6 years and I won't be able to see in this new variable that she was married 6 years if I don't add the condition >=5.

Does anyone have an idea for a code that could help me?


Solution

  • Maybe this is too convulated but seems to work :

    df$marital_length <- with(df, ave(maritalstatus, vn, FUN = function(x) 
                    with(rle(x), rep(as.integer(seq_along(lengths) == 
                         which.max(lengths >= 5)) * lengths, lengths))))
    
    
    df$marital_length
    #[1] 0 0 0 0 0 0 5 5 5 5 5 0 0 0 0 0 0 0 0
    

    which.max(lengths >= 5) gives the index for first time when length is greater than 5.