I have a multivariate time series dataset (almost 30 years) with random missing values.
T | S | po4 | si | din |
---|---|---|---|---|
9.00000 | NA | 0.290 | 5.310 | 18.51 |
8.45000 | NA | 0.130 | 6.180 | 14.74 |
13.60000 | 36.46000 | 0.010 | 0.500 | 1.86 |
23.20000 | 32.12000 | 0.010 | 6.580 | 0.81 |
26.00000 | 32.13000 | 0.070 | 0.500 | 0.23 |
NA | 35.41400 | 0.010 | 1.670 | 0.72 |
24.80000 | 36.42000 | 0.000 | 3.540 | |
24.20000 | 33.16000 | 0.110 | 2.020 | |
22.50000 | 37.60000 | 0.040 | 0.400 | |
16.32000 | 36.01000 | 0.020 | 2.900 | |
17.60000 | 38.04000 | 0.010 | 0.970 | |
9.70000 | 36.36000 | 0.120 | 7.950 | |
13.80000 | 38.33000 | 0.010 | 5.760 | |
7.90000 | 35.51000 | 0.060 | 2.350 | |
11.90000 | 38.33000 | 0.030 | 3.410 | |
24.10000 | 36.30000 | 0.020 | 0.730 | |
25.20000 | 35.77000 | 0.020 | 1.370 | |
24.70000 | 37.54000 | 0.330 | 0.700 | |
5.75000 | 33.26000 | 0.120 | 0.860 | |
13.30000 | 33.14000 | 0.000 | 0.000 | |
13.60000 | 38.21265 | 0.000 | 0.190 | |
15.70000 | 28.33000 | 0.040 | 11.500 | 41.64 |
I would like to fill the gaps in order to have a constant frequency (I have a monthly frequency with missing values) to try different techniques in the content of a time series analysis. I have tried to use the mice package in r and to decide which imputed dataset to use with with() and pool(),but I don't want to use all of them in a model, but obtain the most correct one and use that one for further analysis. How can I do that? How can I find the best one?
Otherwise, can you suggest me another method as an alternative to mice?
Thank you very much in advance
If you have a strong time correlation you can use the imputets package for time series imputation.
library(imputeTS)
na_kalman(your_dataframe)
There are also several other methods included in the package. As for mice, the whole point of multiple imputation is to have several imputed datasets. You would perform your analysis separately on each of them. Then you can compare the results. Since with imputation there always comes some uncertainty along (since your missing data replacements are only estimations). This technique enables you to model / get a feeling for the uncertainty.
If you don't want to do multiple analysis and do single imputation you can use any of these datasets, they are equally valid/there is no best one.
Or you could also use a single imputation package like misssForest.