Search code examples
rdplyr

Subtract the previous number when another row is not NA in R


This is a simplified data example.

df <- data.frame(
  SU = c(1, 1, NA, NA, NA, NA),
  TIME = c(10, 20, 30, 40, 50, 60),
  AMT = c(5, 5, NA, NA, 5, 5)
)

I want to generate a new column TFDS, if is.na(SU) == TRUE, using TIME subtract previous AMT not NA row's TIME.

This is what I want.

SU TIME AMT TFDS
1 10 5 NA
1 20 5 NA
NA 30 NA 10
NA 40 NA 20
NA 50 5 30
NA 60 5 10

I tried dplyr::lag(), but the output was not what I want.

library(dplyr) 

df <- df |> 
  mutate(
    TFDS = if_else(
      is.na(SU) == TRUE, TIME - lag(TIME)[!is.na(AMT)], NA
    ))

I believe dplyr::lag() should be used there, but how to set [!is.na(AMT)]?


Solution

  • You can create a temporary column and use tidyr::fill() to fill down from the TIME value that corresponds to the last non-NA AMT value, then subtract values using lag():

    library(dplyr)
    library(tidyr)
    
    df |>
      mutate(tmp = if_else(!is.na(AMT), TIME, NA)) |>
      fill(tmp, .direction = "down") |>
      mutate(TFDS = if_else(is.na(SU), TIME - lag(tmp), NA)) |>
      select(-tmp)
    
    #   SU TIME AMT TFDS
    # 1  1   10   5   NA
    # 2  1   20   5   NA
    # 3 NA   30  NA   10
    # 4 NA   40  NA   20
    # 5 NA   50   5   30
    # 6 NA   60   5   10