I am working on a time series data. The dataset is:
datALL <- read.table(header=TRUE, text="
ID Year Align
A01 2017 329
A01 2016 NA
A01 2015 NA
A01 2014 314
A01 2013 NA
A01 2012 NA
A01 2011 432
A02 2017 4536
A02 2016 NA
A02 2015 NA
A02 2014 2345
A02 2013 NA
A02 2012 NA
A02 2011 1932
")
datALL
ID Year Align
1 A01 2017 329
2 A01 2016 NA
3 A01 2015 NA
4 A01 2014 314
5 A01 2013 NA
6 A01 2012 NA
7 A01 2011 432
8 A02 2017 4536
9 A02 2016 NA
10 A02 2015 NA
11 A02 2014 2345
12 A02 2013 NA
13 A02 2012 NA
14 A02 2011 1932
I want to impute missing value using imputeTS
package. The package works fine for individual ID
.
datA01 <- read.table(header=TRUE, text="
ID Year Align
A01 2017 329
A01 2016 NA
A01 2015 NA
A01 2014 314
A01 2013 NA
A01 2012 NA
A01 2011 432
")
datA01
ID Year Align
1 A01 2017 329
2 A01 2016 NA
3 A01 2015 NA
4 A01 2014 314
5 A01 2013 NA
6 A01 2012 NA
7 A01 2011 432
### install.packages("imputeTS")
library(imputeTS)
datA01$Year <- ts(datA01[, c(2)])
datA01$Align1 <- na_kalman(datA01$Align)
dat1
ID Year Align Align1
1 A01 2017 329 329.0000
2 A01 2016 NA 318.9847
3 A01 2015 NA 312.7852
4 A01 2014 314 314.0000
5 A01 2013 NA 347.2150
6 A01 2012 NA 387.7720
7 A01 2011 432 432.0000
For A02
it also works perfectly:
datA02 <- read.table(header=TRUE, text="
ID Year Align
A02 2017 4536
A02 2016 NA
A02 2015 NA
A02 2014 2345
A02 2013 NA
A02 2012 NA
A02 2011 1932
")
datA02$Year <- ts(datA02[, c(2)])
datA02$Align1 <- na_kalman(datA02$Align)
datA02
ID Year Align Align1
1 A02 2017 4536 4536.000
2 A02 2016 NA 3510.613
3 A02 2015 NA 3168.817
4 A02 2014 2345 2345.000
5 A02 2013 NA 2485.226
6 A02 2012 NA 2143.431
7 A02 2011 1932 1932.000
For all data together, it will not work because it takes all 14 years as a continuous time-series. It should take each seven-year based on the ID
. I need help in getting a loop function which can take care of that.
datALL$Year <- ts(datALL[, c(2)])
datALL$Align1 <- na_kalman(datALL$Align)
#### WRONG IMPUTATION DUE TO FAILUE IN SEPARATING YEARS BY ID
datALL
ID Year Align Align1
1 A01 2017 329 329.0000
2 A01 2016 NA 808.8287
3 A01 2015 NA 968.7716
4 A01 2014 314 314.0000
5 A01 2013 NA 1288.6573
6 A01 2012 NA 1448.6002
7 A01 2011 432 432.0000
8 A02 2017 4536 4536.0000
9 A02 2016 NA 1928.4289
10 A02 2015 NA 2088.3718
11 A02 2014 2345 2345.0000
12 A02 2013 NA 2408.2575
13 A02 2012 NA 2568.2004
14 A02 2017 1932 1932.0000
ID Year Align Align1
1 A01 2017 329 329.0000
2 A01 2016 NA 318.9847
3 A01 2015 NA 312.7852
4 A01 2014 314 314.0000
5 A01 2013 NA 347.2150
6 A01 2012 NA 387.7720
7 A01 2011 432 432.0000
8 A02 2017 4536 4536.000
9 A02 2016 NA 3510.613
10 A02 2015 NA 3168.817
11 A02 2014 2345 2345.000
12 A02 2013 NA 2485.226
13 A02 2012 NA 2143.431
14 A02 2011 1932 1932.000
dplyr
is your friend:
library(imputeTS)
library(dplyr)
datALL %>%
group_by(ID) %>%
mutate(Align1 = na_kalman(Align))
Result:
ID Year Align Align1
1 A01 2017 329 329.0000
2 A01 2016 NA 318.9847
3 A01 2015 NA 312.7852
4 A01 2014 314 314.0000
5 A01 2013 NA 347.2150
6 A01 2012 NA 387.7720
7 A01 2011 432 432.0000
8 A02 2017 4536 4536.0000
9 A02 2016 NA 3510.6131
10 A02 2015 NA 3168.8175
11 A02 2014 2345 2345.0000
12 A02 2013 NA 2485.2262
13 A02 2012 NA 2143.4306
14 A02 2011 1932 1932.0000
Note that if we turn year
to ts
in the mutate
, we get the following warning message because we are attempting to combine two different time series together:
Warning messages: 1: In mutate_impl(.data, dots) : Vectorizing 'ts' elements may not preserve their attributes 2: In mutate_impl(.data, dots) : Vectorizing 'ts' elements may not preserve their attributes
Turning year
to a ts
is actually not necessary (and incorrect) since na_kalman
takes numeric vectors as input and the Align
variables are the ones that take the values of the time series.