Search code examples
rmatrixmissing-datadiagonal

R -Fill in data gaps with length of 'n' or less along a matrix diagonal


I'm working with some large matrices with values along a diagonal similar to the following.

ontrack <- matrix(c(
         runif(1),NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         runif(1),NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,runif(1),NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,runif(1),runif(1),NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,runif(1),NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
         NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,runif(1)),
         nrow=14, byrow=T
         )

I would like to fill in data gaps of length 'n' or less to connect segments of the diagonal. Using the above matrix for example and filling in data gaps of 3 or less, I would like to go from this:

diag_indx <- which(!is.na(ontrack), arr.ind=T)

which gives

     row col
[1,]   1   1
[2,]   2   1
[3,]   3   3
[4,]   7   5
[5,]   7   6
[6,]   9   8
[7,]  14  13

to this

     row col
       1   1
       2   1
 newV  3   2
       3   3
  new  4   4
  new  5   4
  new  6   4
       7   5
       7   6
  new  8   7
       9   8
      14  13

For instances like newV, the result can be (2,2) or (3,2). My subsequent code uses the diag_indx matrix but data gaps could be filled in the ontrack matrix directly (using any value is ok) if it is more efficient.

In trying to work out a solution, I was envisioning finding the data gaps in the diag_indx matrix using this sequence length equation

seqle <- function(x, incr=1) { 
  if(!is.integer(x)) x <- as.integer(x) 
  n <- length(x)  
  y <- x[-1L] != x[-n] + incr 
  i <- c(which(y|is.na(y)),n) 
  list(lengths = diff(c(0L,i)),
       values = x[head(c(0L,i)+1L,-1L)]) 
}

and then filling in the data gaps using seq(). I'm just not sure how to put it all together efficiently. Thank you for your help.


Solution

  • After some trial and error I came up with a (not so pretty) solution that only requires base R functions.

    diagFillSeq <- function(diag_indx, fillgap=1){
      repeat{
        for(cols in 1:2){
          diag_indx <- diag_indx[order(diag_indx[, cols]), ] #Sort by selected column
          repeat{
            diffs <- diff(diag_indx[, cols]) 
            #Find breaks in sequence with differences >1 (diffs==1 are in sequence) and less than or equal to fillgap
            gap_indx <- which(diffs > 1 & diffs <= (fillgap +1)) #need +1 because fencepost error: 3rd & 7th post diffs=4 but fillgap=3)
            if(length(gap_indx) == 0){break}
            insert_indx <- gap_indx[1]
            seq_length <- diffs[gap_indx[1]] - 1  #need -1 because fencepost error
            #Subset diag_indx and insert filling sequence
            diag_indx <- rbind(diag_indx[1:insert_indx, ],
                          cbind(
                            as.integer( seq(from=diag_indx[insert_indx, 1] +1, to=diag_indx[insert_indx+1, 1] -1, length.out=seq_length) ),
                            as.integer( seq(from=diag_indx[insert_indx, 2] +1, to=diag_indx[insert_indx+1, 2] -1, length.out=seq_length) ) 
                          ),
                          diag_indx[(insert_indx+1):nrow(diag_indx), ]) 
          }
        }
        #Recheck first column to see if any new sequence gaps were created
        diffs <- diff(diag_indx[, 1])
        gap_indx <- which(diffs > 1 & diffs <= (fillgap +1))
        if(length(gap_indx) == 0){return(unname(diag_indx))}
      }
    }
    

    And a test on the diag_indx above

    whatIwant <- matrix(as.integer(c(1,2,3,3,4,5,6,7,7,8,9,14, 1,1,2,3,4,4,4,5,6,7,8,13)), ncol=2)
    whatIwant
    #      [,1] [,2]
    # [1,]    1    1
    # [2,]    2    1
    # [3,]    3    2
    # [4,]    3    3
    # [5,]    4    4
    # [6,]    5    4
    # [7,]    6    4
    # [8,]    7    5
    # [9,]    7    6
    #[10,]    8    7
    #[11,]    9    8
    #[12,]   14   13
    
    identical(diagFillSeq(diag_indx, fillgap=3), whatIwant)
    #TRUE