I am trying to perform a very simple data completion : I have two columns of the same measurement made in two different locations a few miles away from each other. Location 1 is more complete than location 2, and I want to complete the second with the first, by applying a coefficient (loc1/loc2) to 1.
My problem is that it is about weather values, that vary with the day considered, so applying the same mean coefficient to all my values would not be ideal. I first got mean coefficients by the values of loc1 and the day, but for the days with no loc2 values, I would like to :
and when there is no coef value at all for one loc1 value, then coef would be the same than for that loc1 value +/- an interval (the coef of the closest loc1 value on the closest date to the one to be completed) Here is what it looks like :
loc1 Date Coef loc2
1 12 204 3 4
2 8 147 4 2
3 8 204 NA NA
4 10 147 NA NA
5 10 158 NA NA
6 6 159 3 2
7 6 162 NA NA
8 6 170 2 3
9 .3 175 0.4833333 0.145
10 0.3 204 NA NA
11 0.4 146 NA NA
12 0.4 147 NA NA
I am pretty sure there is a non messy way of doing that, but with my limited knowledge using ifelse or ddply, I didn't get any closer to what I want in the end. I feel a loop could do the trick, but I have no idea how..
Any thoughts would be greatly appreciated ! many thanks in advance !
I don't understand exactly what you mean in bullet 3:
when there is no coef value at all for one loc1 value, then coef would be the same than for that loc1 value +/- an interval (the coef of the closest loc1 value on the closest date to the one to be completed)
So I don't have an answer for that part, but this might get you some of the way there:
library(dplyr)
df <- tibble::tribble(
~loc1, ~Date, ~Coef, ~loc2,
12, 204L, 3, 4,
8, 147L, 4, 2,
8, 204L, NA, NA,
10, 147L, NA, NA,
10, 158L, NA, NA,
6, 159L, 3, 2,
6, 162L, NA, NA,
6, 170L, 2, 3,
3, 175L, 0.4833333, 0.145,
0.3, 204L, NA, NA,
0.4, 146L, NA, NA,
0.4, 147L, NA, NA
)
df %>%
# Replace Coef with the coef of same loc1 and same day
group_by(loc1, Date) %>%
mutate(Coef = if_else(!is.finite(Coef), mean(Coef, na.rm = TRUE), Coef)) %>%
# For ones without same day and loc1, use the average of all days at loc1
group_by(loc1) %>%
mutate(Coef = if_else(!is.finite(Coef), mean(Coef, na.rm = TRUE), Coef)) %>%
ungroup() %>%
# Then complete the loc2 with using the completed Coef and loc1
mutate(loc2 = if_else(!is.finite(loc2), loc1 * Coef, loc2))
#> # A tibble: 12 x 4
#> loc1 Date Coef loc2
#> <dbl> <int> <dbl> <dbl>
#> 1 12.0 204 3.0000000 4.000
#> 2 8.0 147 4.0000000 2.000
#> 3 8.0 204 4.0000000 32.000
#> 4 10.0 147 NaN NaN
#> 5 10.0 158 NaN NaN
#> 6 6.0 159 3.0000000 2.000
#> 7 6.0 162 2.5000000 15.000
#> 8 6.0 170 2.0000000 3.000
#> 9 3.0 175 0.4833333 0.145
#> 10 0.3 204 NaN NaN
#> 11 0.4 146 NaN NaN
#> 12 0.4 147 NaN NaN