Search code examples
rnamissing-data

How to avoid gaps due to missing values in matplot in R?


I have a function that uses matplot to plot some data. Data structure is like this:

test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])

So far so good. However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.

test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1]) 

enter image description here

In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way. I also need to use matlines

I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.


Solution

  • I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak. I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.

    The problem causing the gaps is that there are NAs between consecutive values. So my solution is to 'shift' the column values so that there are no NA gaps. For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA). I do this with a function called shift_vec_na() in an apply() loop. The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift.

    Here's the code for the functions:

    # x -> vector
    # bool -> boolean vector; must be same length as x. The values of x where bool 
    #   is TRUE will be 'shifted' to the front of the vector, and the back of the
    #   vector will be all NA (i.e. the number of NAs in the resulting vector is
    #   sum(!bool))
    # returns the 'shifted' vector (will be the same length as x)
    shift_vec_na <- function(x, bool){
      n <- sum(bool)
      if(n < length(x)){
        x[1:n] <- x[bool]
        x[(n + 1):length(x)] <- NA
      } 
      return(x)
    }
    
    # x -> vector
    # y -> matrix, where nrow(y) == length(x)
    # returns a list of two elements ('x' and 'y') that contain the 'adjusted'
    # values that can be used with 'matplot()'
    adj_data_matplot <- function(x, y){
      y2 <- apply(y, 2, function(col_i){
        return(shift_vec_na(col_i, !is.na(col_i)))
      })
      
      x2 <- apply(y, 2, function(col_i){
        return(shift_vec_na(x, !is.na(col_i)))
      })
      return(list(x = x2, y = y2))
    }
    
    

    Then, using the sample data:

    test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
    test$a[3:4] <- NA
    test$b[7] <- NA
    lst <- adj_data_matplot(test[,1], test[,-1])
    
    matplot(lst$x, lst$y, type = "b")
    
    

    plot