Search code examples
rreplacedata.tablesequencegaps-in-data

How to replace a specific sequence of numbers (per row) with another sequence in a big data frame in R?


I have a data.frame with absence/presence data (0/1) for a group of animals, with columns as years and rows as individuals.

My data:

df <- data.frame(Year1 = c('1','0','0','0','0','0'),
                 Year2 = c('1','1','1','0','0','0'),
                 Year3 = c('1','1','1','1','1','0'),
                 Year4 = c('0','1','0','0','0','1'),
                 Year5 = c('0','0','1','1','0','1'),
                 Year6 = c('0','0','0','1','1','1'))

df
     Year1 Year2 Year3 Year4 Year5 Year6
1:     1     1     1     0     0     0
2:     0     1     1     1     0     0
3:     0     1     1     0     1     0
4:     0     0     1     0     1     1
5:     0     0     1     0     0     1
6:     0     0     0     1     1     1

Some individuals have sighting gaps (seen one year (1), then not seen the next year (0), but spotted again in the third year (1)). In total there are 400 rows (=individuals).

What I would like to do is fill the gaps (0s between 1s) with 1s, so that the above data frame becomes:

df
     Year1 Year2 Year3 Year4 Year5 Year6
1:     1     1     1     0     0     0
2:     0     1     1     1     0     0
3:     0     1     1     1     1     0
4:     0     0     1     1     1     1
5:     0     0     1     1     1     1
6:     0     0     0     1     1     1

Zeros before the first 1 and after the last 1 should not be affected.

I have browsed many stackoverflow questions, e.g.:

find and replace numeric sequence in r

Replace a sequence of values by group depending on preceeding values

However, I could not find a solution that works across all columns at once, on a row-by-row basis.

Thank you in advance for your advice! :)


Solution

  • We may do this by row. An efficient option is using dapply from collapse. Loop over the rows, find the position index of 1s, get the sequence between the first and last, and replace those elements to 1.

    library(collapse)
    dapply(df, MARGIN = 1, FUN = function(x)
         replace(x,  do.call(`:`, as.list(range(which(x == 1)))),  1 ))
    

    -output

      Year1 Year2 Year3 Year4 Year5 Year6
    1     1     1     1     0     0     0
    2     0     1     1     1     0     0
    3     0     1     1     1     1     0
    4     0     0     1     1     1     1
    5     0     0     1     1     1     1
    6     0     0     0     1     1     1
    

    An option is also to get the row/column index with which and arr.ind = TRUE, then create the sequence, and use the row/column index to do the assignment which is vectorized

    ind <- which(df ==1, arr.ind = TRUE)
    m1 <- as.matrix(transform(stack(lapply(split(ind[,2], ind[,1]), 
       function(x) x[1]:x[length(x)]))[2:1], ind = as.integer(ind)))
    df[m1] <- 1