Search code examples
rdplyrapproximation

Using approx in dplyr


I'm trying to do a linear approximation for each id in the data frame between year using point x. dplyr seems like an appropriate option for this, but I can't get it to work because of an error:

Error: incompatible size (9), expecting 3 (the group size) or 1

Sample code:

library(dplyr)
dat <- data.frame(id = c(1,1,1,2,2,2,3,3,3), year = c(1,2,3,1,2,3,1,2,3), x = c(1,NA,2, 3, NA, 4, 5, NA, 6))

# Linear Interpolation
dat %>% 
  group_by(id) %>% 
  mutate(x2 = as.numeric(unlist(approx(x = dat$year, y = dat$x, xout = dat$x)[2])))

Sample Data:

  id year  x
1  1    1  1
2  1    2 NA
3  1    3  2
4  2    1  3
5  2    2 NA
6  2    3  4
7  3    1  5
8  3    2 NA
9  3    3  6

Solution

  • Here are a couple of approaches (transferred from comments):

    1) na.approx/ave

    library(zoo)
    
    transform(dat, x2 = ave(x, id, FUN = na.approx))
    

    With year being 1, 2, 3 we did not not need to specify it but if this were needed then:

    nr <- nrow(dat)
    transform(dat, x2 = ave(1:nr, id, FUN = function(i) with(dat[i, ], na.approx(x, year))))
    

    2) na.approx/dplyr

    library(dplyr)
    library(zoo)
    
    dat %>% 
        group_by(id) %>% 
            mutate(x2 = na.approx(x, year)) %>% 
        ungroup()
    

    If year is not needed then omit the second argument to na.approx.

    Note: zoo also has other NA filling functions, particularly na.spline and na.locf.