I want to impute missing values of Temp at a Station by replacing the mean of the Temp at two closest Stations.
library(tidyverse)
library(lubridate)
tb1 <-
tibble::tibble(
Date = as_date(rep(c("2019-01-01", "2019-01-02"), each = 4))
, Stat = rep(c("F", "L", "M", "R"), times = 2)
, Lat = rep(c(31.418715, 31.582045, 30.181459, 33.626057), times = 2)
, Long = rep(c(73.079109, 74.329376, 71.492157, 73.071442), times = 2)
, Temp = c(NA, 20, 28, 25, 26, 25, NA, 24)
)
tb1
# A tibble: 8 x 5
Date Stat Lat Long Temp
<date> <chr> <dbl> <dbl> <dbl>
1 2019-01-01 F 31.4 73.1 NA
2 2019-01-01 L 31.6 74.3 20
3 2019-01-01 M 30.2 71.5 28
4 2019-01-01 R 33.6 73.1 25
5 2019-01-02 F 31.4 73.1 26
6 2019-01-02 L 31.6 74.3 25
7 2019-01-02 M 30.2 71.5 NA
8 2019-01-02 R 33.6 73.1 24
This replacing the missing values by the average of non-missing temp at all stations.
impute.mean <- function(x) {
replace(x, is.na(x), mean(x, na.rm = TRUE))
}
tb1 %>%
group_by(Date) %>%
mutate(Temp1 = impute.mean(Temp))
# A tibble: 8 x 6
# Groups: Date [2]
Date Stat Lat Long Temp Temp1
<date> <chr> <dbl> <dbl> <dbl> <dbl>
1 2019-01-01 F 31.4 73.1 NA 24.3
2 2019-01-01 L 31.6 74.3 20 20
3 2019-01-01 M 30.2 71.5 28 28
4 2019-01-01 R 33.6 73.1 25 25
5 2019-01-02 F 31.4 73.1 26 26
6 2019-01-02 L 31.6 74.3 25 25
7 2019-01-02 M 30.2 71.5 NA 25
8 2019-01-02 R 33.6 73.1 24 24
Used this code the find the distance between two Stations
library(geosphere)
distm(
x = c(73.079109, 31.418715)
, y = c(74.329376, 31.582045)
, fun = distHaversine
)
[,1]
[1,] 120053.3
Could not figure out how to calculate distance using tidyverse?
tb1 %>%
mutate(
Dist = distm(
x = c(Long, Lat)
, y = c(Long, Lat)
, fun = distHaversine
)
)
Error in .pointsToMatrix(x) : Wrong length for a vector, should be 2
I add below a solution using the spatialrisk package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.
Stations in tb1 without an observation for Temp:
tb1_na <- tb1 %>% filter(is.na(Temp))
Create function to determine distances to stations for a certain date:
circle_fn <- function(x, y, z){
spatialrisk::points_in_circle(tb1 %>% filter(Date == z),
lon_center = x,
lat_center = y,
lon = Long,
lat = Lat,
radius = 1e6)
}
Since each element of the output is a data frame, purrr::map_dfr is used to row-bind them together:
purrr::pmap_dfr(list(tb1_na$Long, tb1_na$Lat, tb1_na$Date),
circle_fn, .id = "tb1_na") %>%
group_by(tb1_na) %>%
slice(2:3) %>%
summarize(Temp = mean(Temp)) %>%
ungroup() %>%
bind_cols(tb1_na, .) %>%
select(-tb1_na)
Output:
Date Stat Lat Long Temp Temp1
<date> <chr> <dbl> <dbl> <dbl> <dbl>
1 2019-01-01 F 31.4 73.1 NA 24
2 2019-01-02 M 30.2 71.5 NA 25.5