I have two data.tables in which I have some missing values. The non-missing values make a curve and the missing values should fit into this curve.
temp_dt_1 = structure(list(x = c(33, 34, 35, 36, 37, 38, 39, 40, 45, 50,
51.5, 52.5, 53, 54, 55, 55.5, 56, 56.5, 57, 57.5, 58, 59, 59.5,
60, 60.5, 61, 61.5, 62, 62.5, 63, 63.5, 64, 65, 66, 66.5, 67,
67.5), y = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 2.33, NA, NA, 1.6, NA, NA, 0.91, NA, 0.665,
NA, 0.465, 0.39, 0.325, 0.265, 0.24, 0.135, 0.1, 0.095, NA, 0.105
)), row.names = c(NA, -37L), class = c("data.table", "data.frame"
))
> temp_dt_1
x y
1: 33.0 NA
2: 34.0 NA
3: 35.0 NA
4: 36.0 NA
5: 37.0 NA
6: 38.0 NA
7: 39.0 NA
8: 40.0 NA
9: 45.0 NA
10: 50.0 NA
11: 51.5 NA
12: 52.5 NA
13: 53.0 NA
14: 54.0 NA
15: 55.0 NA
16: 55.5 NA
17: 56.0 NA
18: 56.5 2.330
19: 57.0 NA
20: 57.5 NA
21: 58.0 1.600
22: 59.0 NA
23: 59.5 NA
24: 60.0 0.910
25: 60.5 NA
26: 61.0 0.665
27: 61.5 NA
28: 62.0 0.465
29: 62.5 0.390
30: 63.0 0.325
31: 63.5 0.265
32: 64.0 0.240
33: 65.0 0.135
34: 66.0 0.100
35: 66.5 0.095
36: 67.0 NA
37: 67.5 0.105
The second data.table
has missing values on both sides (at start and end of y
) -
temp_dt_2 = structure(list(x = c(25, 27.5, 30, 32.5, 35, 40, 43, 45, 47,
50, 52.5, 55, 57.5, 60, 65, 85, 90), y = c(NA, NA, 1.1, NA,
1.765, 2.64, 3.42, 3.975, NA, 5.625, 6.95, 7.825, 8.95, 10.375,
13.725, NA, NA)), row.names = c(NA, -17L), class = c("data.table",
"data.frame"))
In the plot below the black line shows the known points (which are present in data.table) and the red line is the approximate expected output for temp_dt_1
library(ggplot2)
ggplot(temp_dt_1, aes(x, y)) + geom_line()
I am looking for only data.table
solution. I will really appreciate it if the code has good performance.
You can use approxfun
, splinefun
, loess
, ... to fill up missing values.
res <- cbind(a = approxfun(temp_dt_1$x, temp_dt_1$y, rule=2)(temp_dt_1$x),
s = splinefun(temp_dt_1$x, temp_dt_1$y)(temp_dt_1$x),
l = predict(loess(y ~ x, temp_dt_1, control = loess.control(surface = "direct")), temp_dt_1))
plot(temp_dt_1$x, temp_dt_1$y, ylim=range(res))
for(i in 1:3) lines(temp_dt_1$x, res[,i], col=i, lwd=2)
legend("topright", c("approxfun", "spline", "loess"), lty=1, lwd=2, col=1:3)