Search code examples
rnacalculated-columns

Calculating difference when N/A's are present


I have a table (listed below) in which we measured Temperature from a well. Unfortunately (mainly due to weather) there wasn't always water in the wells that could be measured so we have NA's persistent.

   Date   Temp(C)
   09-22    14
   09-29    19.6
   10-15    NA
   10-28    11
   11-06    NA
   11-21    8
   12-13    6

So what I want to do is create a new column calculating Delta T or change in temperature from one sampling date to the next one, and so on. If there is an NA, I want to avoid it and do the calculation using the last sampled temperature. But I have no idea how to write that code.


Solution

  • Something like this?

    df$TempDiff <- c(NA_real_, diff(zoo::na.locf(df$`Temp(C)`)))
    

    na.locf means replacing an NA with the most recent non-NA prior to it.

    Output looks like this

    # A tibble: 7 x 3
      Date  `Temp(C)` TempDiff
      <chr>     <dbl>    <dbl>
    1 09-22      14       NA  
    2 09-29      19.6      5.6
    3 10-15      NA        0  
    4 10-28      11       -8.6
    5 11-06      NA        0  
    6 11-21       8       -3  
    7 12-13       6       -2
    

    To do it by group

    df <- dplyr::group_by(df, Well)
    dplyr::mutate(df, TempDiff = c(NA_real_, diff(zoo::na.locf(`Temp(C)`))))
    

    Output

    # A tibble: 7 x 4
    # Groups:   Well [2]
      Date  `Temp(C)`  Well TempDiff
      <chr>     <dbl> <dbl>    <dbl>
    1 09-22      14       1     NA  
    2 09-29      19.6     1      5.6
    3 10-15      NA       1      0  
    4 10-28      11       2     NA  
    5 11-06      NA       2      0  
    6 11-21       8       2     -3  
    7 12-13       6       2     -2