I have difficulties interpolating a grouped dataframe. My problem is: Some of my groups have no value at all, then either 0 or NA; some groups have only one value, then this should be multiplied and if there are several values (N >= 2), then a linear interpolarisation should be carried out.
In the following an example df and two things I tried (both failed).
df <- data.frame(ID = seq(1,20, 1), group = rep(1:4,1, each = 5), year = rep(2016:2020,4), value =sample(1:100, 20) )
df <- df %>% mutate(value = if_else(group == 1, NA, value))
df[c(8:9, 15:16, 18:20),"value"] <- NA
###First Try
df %>%group_by(group) %>%
mutate(
int= if_else(all(is.na(value)), 0, value[1]),
int= if_else(sum(!is.na(value))<=2, 0, na.omit(value[1])),
int= if_else(is.na(value), imputeTS::na_interpolation(value), service[1]))
###second try
df %>%group_by(group) %>%
mutate(
int = case_when(
sum(!is.na(value)) == 0 ~ 0,
sum(!is.na(value)) == 1 ~ na.omit(value)[1],
TRUE ~ as.numeric(imputeTS::na_interpolation(value))))
However I do not stick to: imputeTS::na_interpolation(value), others would be fine as well.
You can use na.approx
from zoo
, which doesn't have the strict "At least 2 non-NA data points are required" condition:
library(zoo)
df %>%
group_by(group) %>%
mutate(int = case_when(
sum(!is.na(value))==0 ~ 0,
sum(!is.na(value))==1 ~ na.omit(value)[1],
sum(!is.na(value)) >1 ~ na.approx(value, na.rm=FALSE),
.default = NA))
Gives
# A tibble: 20 × 5
# Groups: group [4]
ID group year value int
<dbl> <int> <int> <int> <dbl>
1 1 1 2016 NA 0
2 2 1 2017 NA 0
3 3 1 2018 NA 0
4 4 1 2019 NA 0
5 5 1 2020 NA 0
6 6 2 2016 33 33
7 7 2 2017 79 79
8 8 2 2018 NA 60.3
9 9 2 2019 NA 41.7
10 10 2 2020 23 23
11 11 3 2016 11 11
12 12 3 2017 2 2
13 13 3 2018 83 83
14 14 3 2019 80 80
15 15 3 2020 NA NA
16 16 4 2016 NA 64
17 17 4 2017 64 64
18 18 4 2018 NA 64
19 19 4 2019 NA 64
20 20 4 2020 NA 64