In a Panel Data, I have some NA values that I would like to extrapolate at the end or start of my years of interest. Most solutions to similar questions deal with interpolation. Please note that this is not the case.
A similar sample of my data called "data" looks like this:
> data
REGION YEAR VALUE
1 A 2011 NA
2 A 2012 NA
3 A 2013 NA
4 A 2014 20.00
5 A 2015 25.00
6 A 2016 30.00
7 A 2017 35.00
8 A 2018 40.00
9 B 2011 NA
10 B 2012 0.30
11 B 2013 0.50
12 B 2014 0.70
13 B 2015 0.90
14 B 2016 0.11
15 B 2017 0.13
16 B 2018 0.15
17 C 2011 100.00
18 C 2012 101.00
19 C 2013 102.00
20 C 2014 103.00
21 C 2015 104.00
22 C 2016 105.00
23 C 2017 106.00
24 C 2018 NA
Some solutions that I found to similar questions:
I have try with na_interpolate
but it just repeats the last values. I also tried with mutate
combined with na.spline
, but it creates values that are obviously wrong for a linear regression
I am sure that there must be a simple way to estimate these values.
The expected result should look like this:
> data
REGION YEAR VALUE
1 A 2011 5.00
2 A 2012 10.00
3 A 2013 15.00
4 A 2014 20.00
5 A 2015 25.00
6 A 2016 30.00
7 A 2017 35.00
8 A 2018 40.00
9 B 2011 0.10
10 B 2012 0.30
11 B 2013 0.50
12 B 2014 0.70
13 B 2015 0.90
14 B 2016 0.11
15 B 2017 0.13
16 B 2018 0.15
17 C 2011 100.00
18 C 2012 101.00
19 C 2013 102.00
20 C 2014 103.00
21 C 2015 104.00
22 C 2016 105.00
23 C 2017 106.00
24 C 2018 107.00
Thank you for your help
You can just do:
predictions <- round(predict(lm(VALUE ~ REGION * YEAR, df), newdata = df), 2)
predictions
#> 1 2 3 4 5 6 7 8 9 10
#> 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 0.65 0.59
#> 11 12 13 14 15 16 17 18 19 20
#> 0.53 0.46 0.40 0.34 0.27 0.21 100.00 101.00 102.00 103.00
#> 21 22 23 24
#> 104.00 105.00 106.00 107.00
df$VALUE[is.na(df$VALUE)] <- predictions[is.na(df$VALUE)]
Which gives the desired result:
df
#> REGION YEAR VALUE
#> 1 A 2011 5.00
#> 2 A 2012 10.00
#> 3 A 2013 15.00
#> 4 A 2014 20.00
#> 5 A 2015 25.00
#> 6 A 2016 30.00
#> 7 A 2017 35.00
#> 8 A 2018 40.00
#> 9 B 2011 0.65
#> 10 B 2012 0.30
#> 11 B 2013 0.50
#> 12 B 2014 0.70
#> 13 B 2015 0.90
#> 14 B 2016 0.11
#> 15 B 2017 0.13
#> 16 B 2018 0.15
#> 17 C 2011 100.00
#> 18 C 2012 101.00
#> 19 C 2013 102.00
#> 20 C 2014 103.00
#> 21 C 2015 104.00
#> 22 C 2016 105.00
#> 23 C 2017 106.00
#> 24 C 2018 107.00