Search code examples
rlinear-interpolation

interpolarization within groups with NA


I have difficulties interpolating a grouped dataframe. My problem is: Some of my groups have no value at all, then either 0 or NA; some groups have only one value, then this should be multiplied and if there are several values (N >= 2), then a linear interpolarisation should be carried out.

In the following an example df and two things I tried (both failed).

df <- data.frame(ID = seq(1,20, 1), group = rep(1:4,1, each = 5), year = rep(2016:2020,4), value =sample(1:100, 20) )
df <- df %>% mutate(value = if_else(group == 1, NA, value))
df[c(8:9, 15:16, 18:20),"value"] <- NA


###First Try
df %>%group_by(group) %>% 
  mutate(
int= if_else(all(is.na(value)), 0, value[1]),
int= if_else(sum(!is.na(value))<=2, 0, na.omit(value[1])),
int= if_else(is.na(value), imputeTS::na_interpolation(value), service[1]))

###second try
df %>%group_by(group) %>% 
  mutate(
int = case_when(
  sum(!is.na(value)) == 0 ~ 0,  
  sum(!is.na(value)) == 1 ~ na.omit(value)[1],
  TRUE ~ as.numeric(imputeTS::na_interpolation(value))))

However I do not stick to: imputeTS::na_interpolation(value), others would be fine as well.


Solution

  • You can use na.approx from zoo, which doesn't have the strict "At least 2 non-NA data points are required" condition:

    library(zoo)
    
    df %>% 
      group_by(group) %>% 
      mutate(int = case_when(
          sum(!is.na(value))==0 ~ 0,  
          sum(!is.na(value))==1 ~ na.omit(value)[1],
          sum(!is.na(value)) >1 ~ na.approx(value, na.rm=FALSE),
          .default = NA))
    

    Gives

    # A tibble: 20 × 5
    # Groups:   group [4]
          ID group  year value   int
       <dbl> <int> <int> <int> <dbl>
     1     1     1  2016    NA   0  
     2     2     1  2017    NA   0  
     3     3     1  2018    NA   0  
     4     4     1  2019    NA   0  
     5     5     1  2020    NA   0  
     6     6     2  2016    33  33  
     7     7     2  2017    79  79  
     8     8     2  2018    NA  60.3
     9     9     2  2019    NA  41.7
    10    10     2  2020    23  23  
    11    11     3  2016    11  11  
    12    12     3  2017     2   2  
    13    13     3  2018    83  83  
    14    14     3  2019    80  80  
    15    15     3  2020    NA  NA  
    16    16     4  2016    NA  64  
    17    17     4  2017    64  64  
    18    18     4  2018    NA  64  
    19    19     4  2019    NA  64  
    20    20     4  2020    NA  64