Search code examples
rplotnamissing-data

Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?


The problem

I have three variables recorded over time. The first (black) is recorded at every time period, the second (blue) every other time period, the third (red) at every time period except one. I try to plot these in R:

test <- data.frame(time=c(1:5), black=c(3, 3, 3, 3, 3), blue=c(1, NA, 3, NA, 5), red=c(5, 4, NA, 2, 1))

plot(test$time, test$black, type="l", col="black")
lines(test$time, test$blue, col="blue")
lines(test$time, test$red, col="red")

The result is a plot in which 'black' is the only continuous line, 'blue' is completely absent, and 'red' is absent between time 2 and time 4. I would like all three lines to be continuous.

Attempted solutions

From How to connect dots where there are missing values?

plot(na.omit(test), test$time, test$black, type="l", col="black")

Returns "Error in match.fun(panel) : 'test$black' is not a function, character or symbol".

na.omit(test)
plot(test$time, test$black, type="l", col="black")
lines(test$time, test$blue, col="blue")
lines(test$time, test$red, col="red")

The plot is the same as in my original problem, and actually omits every time period in which one of the variables is missing data, so actual data (in this example, for black) is omitted alongside every period in which there is missing data for any of the other variables.

From How to I draw a line plot and ignore missing values in R

plot(type="l", test$time, test$black, col="black")
lines(which(!is.na(test$blue)), na.omit(test$blue), test$time, test$blue, col="blue")
lines(test$time, test$red, col="red")

Returns "Error in plot.xy(xy.coords(x, y), type = type, ...) : invalid plot type". Even amending the first line to plot(test$time, test$black, col="black") does not resolve this error.

From How to connect dots where there are missing values?

plot(approx(test, xout=seq_along(test))$y, type="l", test$time, test$black, col="black")

Returns "Error in xy.coords(x, y, xlabel, ylabel, log) :  'x' and 'y' lengths differ".

From R - Plotting a line with missing NA values

There it is commented that na.omit() or na.approx() "seem to work only if I would plot 'A' separately in a stand-alone plot, they do not seem to work in conjunction with 'Time' and 'B' and 'C' all in the same plot" and that this as a "super weird bug". They suggest:

plot(test$time[!is.na(test$black)],test$black[!is.na(test$black)],type="l")
lines(test$time,test$blue, type="l",col="blue")
lines(test$time, test$red, type="l", col="red")

The plot is the same as in my original problem. If I change the coding for 'blue' to (test$time, test$blue, type="p", col="blue") then I get a single point at time point 3, but not the line that I would expect.

Also from R - Plotting a line with missing NA values

xlim <- range(test$time)
ylim <- range(subset[-1], na.rm = TRUE)

Quickly returns "Error in subset[-1] : object of type 'closure' is not subsettable".

ok <- ! is.na(test$black)
plot(black ~ time, time, time = ok, type = "l", xlim = xlim, ylim = ylim)

Quickly returns "Error in FUN(X[[i]], ...) : invalid 'envir' argument of type 'closure'". I also cannot see how 'blue' or 'red' data would enter into this plot, even if it didn't return an error.

So is there any way to use plot() for plotting multiple variables over time when one of them has missing data?


Solution

  • If you want to connect the dots when you have missing data, you can use library(zoo)

    install.packages('zoo')
    library(zoo)
    
    # Create the data frame
    test <- data.frame(time = 1:5, black = c(3, 3, 3, 3, 3), blue = c(1, NA, 3, NA, 5), red = c(5, 4, NA, 2, 1))
    
    # Interpolate missing values
    test$blue <- na.approx(test$blue)
    test$red <- na.approx(test$red)
    
    # Plot the data
    plot(test$time, test$black, type = "l", col = "black", ylim = range(na.omit(test[-1])))
    lines(test$time, test$blue, col = "blue")
    lines(test$time, test$red, col = "red")
    

    enter image description here