This is a simplified data example.
df <- data.frame(
SU = c(1, 1, NA, NA, NA, NA),
TIME = c(10, 20, 30, 40, 50, 60),
AMT = c(5, 5, NA, NA, 5, 5)
)
I want to generate a new column TFDS, if is.na(SU) == TRUE, using TIME subtract previous AMT not NA row's TIME.
This is what I want.
SU | TIME | AMT | TFDS |
---|---|---|---|
1 | 10 | 5 | NA |
1 | 20 | 5 | NA |
NA | 30 | NA | 10 |
NA | 40 | NA | 20 |
NA | 50 | 5 | 30 |
NA | 60 | 5 | 10 |
I tried dplyr::lag()
, but the output was not what I want.
library(dplyr)
df <- df |>
mutate(
TFDS = if_else(
is.na(SU) == TRUE, TIME - lag(TIME)[!is.na(AMT)], NA
))
I believe dplyr::lag()
should be used there, but how to set [!is.na(AMT)]
?
You can create a temporary column and use tidyr::fill()
to fill down from the TIME value that corresponds to the last non-NA AMT value, then subtract values using lag()
:
library(dplyr)
library(tidyr)
df |>
mutate(tmp = if_else(!is.na(AMT), TIME, NA)) |>
fill(tmp, .direction = "down") |>
mutate(TFDS = if_else(is.na(SU), TIME - lag(tmp), NA)) |>
select(-tmp)
# SU TIME AMT TFDS
# 1 1 10 5 NA
# 2 1 20 5 NA
# 3 NA 30 NA 10
# 4 NA 40 NA 20
# 5 NA 50 5 30
# 6 NA 60 5 10