Search code examples
rplotmissing-datalocf

Plot imputed values


I was asked to have a dataset imputed with both the LOCF and the NOCB methods by using na.locf() function from zoo package and I'm trying now plotting both the observed and the imputed values. The dataset I'm working is the following one:

structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27), 
    sex = c("F", "F", NA, "F", "F", "F", "F", "F", "F", "F", 
    "F", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", 
    "M", "M", "M", "M", "M"), d8 = c(21, 21, NA, 23.5, 21.5, 
    20, 21.5, 23, NA, 16.5, 24.5, 26, 21.5, 23, 25.5, 20, 24.5, 
    22, 24, 23, 27.5, 23, 21.5, 17, 22.5, 23, 22), d10 = c(20, 
    21.5, 24, 24.5, 23, 21, 22.5, 23, 21, 19, 25, 25, 22.5, 22.5, 
    27.5, 23.5, 25.5, 22, 21.5, 20.5, 28, 23, 23.5, 24.5, 25.5, 
    24.5, 21.5), d12 = c(21.5, 24, NA, 25, 22.5, 21, 23, 23.5, 
    NA, 19, 28, 29, 23, NA, 26.5, 22.5, 27, 24.5, 24.5, 31, 31, 
    23.5, 24, 26, 25.5, 26, 23.5), d14 = c(23, 25.5, 26, 26.5, 
    23.5, 22.5, 25, 24, 21.5, 19.5, 28, 31, 26.5, 27.5, 27, 26, 
    28.5, 26.5, 25.5, 26, 31.5, 25, 28, 29.5, 26, 30, 25)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -27L), spec = structure(list(
    cols = list(id = structure(list(), class = c("collector_double", 
    "collector")), sex = structure(list(), class = c("collector_character", 
    "collector")), d8 = structure(list(), class = c("collector_double", 
    "collector")), d10 = structure(list(), class = c("collector_double", 
    "collector")), d12 = structure(list(), class = c("collector_double", 
    "collector")), d14 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

I've imputed the missing values by converting the original wide format towards a long format, and following the remaining steps:

data_long <-  tidyr::gather(dati, age, measurements, d8:d14, factor_key = TRUE)

data_locf <- data_long

locf <- na.locf(data_locf$measurements, na.rm = T, fromLast = F)
nocb <- na.locf(data_locf$measurements, na.rm = T, fromLast = T)

data_locf$measurements = ifelse(data_locf$age == 'd12', locf, nocb)

data_locf$sex = na.locf(data_locf$sex, na.rm = T, fromLast = T)

data_complete = complete(data = data_locf, fill = c(data_locf$measurements, data_locf$sex))

Is there someone who knows a way to plot graphically the imputed values togheter with the observed ones? I let you here a couple of function which I was recommed to use and from which I've started putting on the proper modifications, unsuccessfully, though.

#1 plot    
par(mfrow=c(1,1))
    measurements <- data_complete$measurements
    locf <- function(x) {
      a <- x[1]
      for (i in 2:length(x)) {
        if (is.na(x[i])) x[i] <- a
        else a <- x[i]
      }
      return(x)
    }
    meas1 <- na.locf(measurements)
    colvec <- ifelse(is.na(measurements),mdc(2),mdc(1))
    plot(measurements,col=colvec,type="l",xlab= 'sex' ,ylab="measurements")
    points(measurements, col=colvec,pch=20,cex=1)

that doesn't return back a representation properly separated for both genders and:

 #2 plot 
par(mfrow=c(1,2))
breaks <- seq(-20, 200, 10)
nudge <- 1
lwd <- 1.5
x <- matrix(c(breaks-nudge, breaks+nudge), ncol=2)
obs <- airquality[,"Ozone"]
mis  <- imp$imp$Ozone[,1]
fobs <- c(hist(obs, breaks, plot=FALSE)$counts, 0)
fmis <- c(hist(mis, breaks, plot=FALSE)$counts, 0)
y <- matrix(c(fobs, fmis), ncol=2)

tp <- xyplot(imp, Ozone~Solar.R, na.groups=ici(imp),
             ylab="Ozone (ppb)", xlab="Solar Radiation (lang)",
             cex = 0.75, lex=lwd, pch=19,
             ylim = c(-20, 180), xlim = c(0,350))
print(tp)

that reproduces a nice scatterplot for the airquality dataset fron the mice package. The crucial point is that I'm not able to extract the imputed values by using the na.locf function.

I specify that I should plot age/measurements as response variable vs sex, that's why I need for a separation between the two genders.


Solution

  • I might be a little late, but you could have used the plotting functions of the imputeTS CRAN package to apply different imputation algorithms and also plot these along with the observed values.

    Short example:

    library("imputeTS")
    
    # Using tsAirgap as example time series
    
    # Last Observation Carried Forward - LOCF
    imp_locf <- na_locf(tsAirgap)
    
    # Next Observation Carried Backwards - NOCB
    imp_nocb <- na_locf(tsAirgap, option = "nocb")
    
    # Impute with Moving average
    imp_ma <- na_ma(tsAirgap)
    
    # Example plot for the na_ma imputations
    ggplot_na_imputations(tsAigap, imp_ma)
    

    Here is how these plots look like: enter image description here

    There are also other missing data plots and imputation methods available like linear interpolation, spline interpolation, stineman interpolation, seasonally adjusted imputation, kalman smoothing on state space models.