I have a grouped dataframe;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000,3000,700,800,900,1000,20000,20000,30000,30000)
DF <- data.frame(Truck, OilChanged, Odometer)
# Truck OilChanged Odometer
# 1 A True 1000
# 2 A NewOil 1000
# 3 A False 2000
# 4 A False 3000
# 5 B False 700
# 6 B False 800
# 7 B False 900
# 8 B False 1000
# 9 C True 20000
# 10 C NewOil 20000
# 11 C True 30000
# 12 C NewOil 30000
I'm trying to infer the age of the oil (in kilometers) when possible. Inference is only possible once an oil change has occurred. If no oil change occurs the age of the oil will remain a mystery (example: Truck B).
Below is the desired result;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000, 3000,700,800,900,1000,20000,20000,30000,30000)
OilAge <- c(NA,0,1000,2000,NA,NA,NA,NA,NA,0,10000,0)
Result <- data.frame(Truck, OilChanged, Odometer, OilAge)
# Truck OilChanged Odometer OilAge
# 1 A True 1000 NA
# 2 A NewOil 1000 0
# 3 A False 2000 1000
# 4 A False 3000 2000
# 5 B False 700 NA
# 6 B False 800 NA
# 7 B False 900 NA
# 8 B False 1000 NA
# 9 C True 20000 NA
# 10 C NewOil 20000 0
# 11 C True 30000 10000
# 12 C NewOil 30000 0
Note: The odometer reading between a True oilchanged row and it's following NewOil row will always be the same. Because an oil sample is taken directly before the oil is changed. But both rows must be maintained for downstream calculations to function properly, such as rate-of-change formulas.
NA in the OilAge column means the age is a mystery.
Please let me know if this solution works for you.
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000,3000,700,800,900,1000,20000,20000,30000,30000)
DF <- data.frame(Truck, OilChanged, Odometer)
DF %>%
group_by(Truck) %>%
mutate(status = length(unique(OilChanged)),
OilAge = ifelse(OilChanged == "NewOil", 0,
ifelse(OilChanged == "False", Odometer - (Odometer - lag(Odometer)),
ifelse(OilChanged == "True", Odometer - lag(Odometer), NA)))) %>%
mutate(OilAge = ifelse(status !=1, OilAge, NA)) %>%
subset(select = c(Truck, OilChanged, Odometer, OilAge))