I have a table (listed below) in which we measured Temperature from a well. Unfortunately (mainly due to weather) there wasn't always water in the wells that could be measured so we have NA's persistent.
Date Temp(C)
09-22 14
09-29 19.6
10-15 NA
10-28 11
11-06 NA
11-21 8
12-13 6
So what I want to do is create a new column calculating Delta T or change in temperature from one sampling date to the next one, and so on. If there is an NA, I want to avoid it and do the calculation using the last sampled temperature. But I have no idea how to write that code.
Something like this?
df$TempDiff <- c(NA_real_, diff(zoo::na.locf(df$`Temp(C)`)))
na.locf
means replacing an NA
with the most recent non-NA prior to it.
Output looks like this
# A tibble: 7 x 3
Date `Temp(C)` TempDiff
<chr> <dbl> <dbl>
1 09-22 14 NA
2 09-29 19.6 5.6
3 10-15 NA 0
4 10-28 11 -8.6
5 11-06 NA 0
6 11-21 8 -3
7 12-13 6 -2
To do it by group
df <- dplyr::group_by(df, Well)
dplyr::mutate(df, TempDiff = c(NA_real_, diff(zoo::na.locf(`Temp(C)`))))
Output
# A tibble: 7 x 4
# Groups: Well [2]
Date `Temp(C)` Well TempDiff
<chr> <dbl> <dbl> <dbl>
1 09-22 14 1 NA
2 09-29 19.6 1 5.6
3 10-15 NA 1 0
4 10-28 11 2 NA
5 11-06 NA 2 0
6 11-21 8 2 -3
7 12-13 6 2 -2