Search code examples
rinterpolation

R interpolate rows with different start and end columns


I have multiple data sets that look like the following:

country 2000 2001 2002 2003 2004 2005 2006 2007
Germany 0.1 0.2 0.5 NA 0.7 0.6 0.8 NA
USA NA NA NA 0.1 0.4 NA 0.4 0.1
France 0.4 0.3 0.5 0.8 NA NA 1.0 1.3
Belgium NA 0.1 NA 0.5 0.6 0.5 NA NA

I want to interpolate horizontally, starting from the first existing value until the last existing value. In essence, NAs at the start and NAs at the end should stay NAs, but NAs in the middle should be interpolated. I have been trying to think of a possible solution, but nothing comes up. I appreciate your help!


Solution

  • You can use zoo::na.approx:

    df[-1] <- t(apply(df[-1], 1, \(x) na.approx(x, na.rm = F)))
    

    or even without apply:

    replace(df, -1, t(na.approx(t(df[-1]))))
    

    output

      country `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007`
    1 Germany    0.1    0.2    0.5    0.6  0.7    0.6      0.8   NA  
    2 USA       NA     NA     NA      0.1  0.4    0.4      0.4    0.1
    3 France     0.4    0.3    0.5    0.8  0.867  0.933    1      1.3
    4 Belgium   NA      0.1    0.3    0.5  0.6    0.5     NA     NA  
    

    Note that a tidy format might be easier for further wrangling here:

    library(dplyr)
    library(tidyr)
    df %>% 
      mutate(id = 1:n()) %>% 
      pivot_longer(-c(country, id)) %>% 
      group_by(id) %>% 
      mutate(value = na.approx(value, na.rm = F))