Search code examples
rmeandata-manipulationmissing-dataimputation

adding missing data based on mean of values before and after record


Missing data of the WT column is coded as -99. I want to replace the missing values by the average of the previous and next value of WT.

For example, missing values are at time =2 and time=3 I want to compute Time1+Time4/2 and add it on both time =2 and time =3 i.e. 5.5 will be added on both Time= 2 and 3. It should be the same for all missing values.

ID  TIME    WT
1   0   4
1   1   5
1   2   -99
1   3   -99
1   4   6
1   5   9
1   6   12
1   7   -99
1   8   -99
1   9   -99
1   10  -99
1   12  16
1   14  18
1   16  20

Solution

  • I've added missing values to the first and last rows for testing:

    data <- data.frame(ID=1,TIME=0:16,WT=4:20)
    data[c(1,3,4,8,9,10,11,16,17),"WT"] <- -99
    

    A somewhat inelegant approach but you could try first dealing with the possibility of missing first and last values:

    data$WT[data$WT == -99] <- NA
    missing <- which(is.na(data$WT))
    
    if(1 %in% missing){
      data$WT[1] <- head(data$WT[!is.na(data$WT)],1)
    }
    if(nrow(data) %in% missing){
      data$WT[nrow(data)] <- tail(data$WT[!is.na(data$WT)],1)
    }
    

    Then, you could find the start and ends of each run of NAs with this function:

    get_runs <- function(x){
      starts <- which(diff(x) == 1)
      y <- rle(x)
      len <- y$lengths[y$values==TRUE]
      ends <- starts + len+1
      return(list(starts=starts,len=len,ends=ends, i=1:length(starts)))
    }
    
    r <- get_runs(is.na(data$WT))
    

    Finally, loop through the missing values and fill them:

    for(i in r$i){
      idx <- seq(r$starts[i]+1,r$ends[i]-1,1)
      data$WT[idx] <- (data$WT[r$starts[i]] + data$WT[r$ends[i]])/2
    }
    
    data
       ID TIME   WT
    1   1    0  5.0
    2   1    1  5.0
    3   1    2  6.5
    4   1    3  6.5
    5   1    4  8.0
    6   1    5  9.0
    7   1    6 10.0
    8   1    7 12.5
    9   1    8 12.5
    10  1    9 12.5
    11  1   10 12.5
    12  1   11 15.0
    13  1   12 16.0
    14  1   13 17.0
    15  1   14 18.0
    16  1   15 18.0
    17  1   16 18.0