Search code examples
rmissing-datamoving-averageimputation

r replace each missing value with a mean of two previous values


I have a data frame with some NAs in column 'myvalues':

x <- data.frame(mydates = as.Date(c("2018/04/01","2018/04/02","2018/04/03","2018/04/04",
                                                     "2018/04/05","2018/04/06","2018/04/07","2018/04/08",
                                                     "2018/04/09","2018/04/10","2018/04/11")),
                                 myvalues = c(2.3, NA, 2.1, 2.2, NA, 2.4, 2.3, 2.1, NA, NA, 2.6))
x

I'd like to replace each NA with the mean of the previous two values. For the NA in row 2, it should be equal to the value in row 1. I could do it using a 'for' loop through NAs in x$myvalues. However, it's very slow, and I am looking for a fast solution, because I'll have to do the same thing for millions on tiny data frames like x.

Thank you very much!


Solution

  • You can reduce the Reduce function. In this case, for example, the last NA will be the average value of the previous two values, but it first fills the previous value then uses that to obtain the current value

    x$myvalues=Reduce(function(x,y)if(is.na(y))c(x,mean(tail(x,2))) else c(x,y),x$myvalues)
    > x
          mydates myvalues
    1  2018-04-01     2.30
    2  2018-04-02     2.30
    3  2018-04-03     2.10
    4  2018-04-04     2.20
    5  2018-04-05     2.15
    6  2018-04-06     2.40
    7  2018-04-07     2.30
    8  2018-04-08     2.10
    9  2018-04-09     2.20
    10 2018-04-10     2.15
    11 2018-04-11     2.60