I have a data frame as below, where I create a lagged column by using lag()
on the observed values in the values
column. Each row in my data frame is associated to a specific journey. I would like to correct the lag()
operation as now, it doesn't consider if the value
is the first on a new journey, meaning there should be no previous recording. Then I want to drop that row from my data frame.
By running the df_output, the desired output can be observed, but now it's done manually.
My real data frame contains a large amount of rows, and in turn journeys.
# Reproducible example
df <- data.frame(tours = c("kuu122", "kuu122", "ansc123123", "ansc123123", "ansc123123", "ansc123123", "baa3999", "baa3999", "baa3999", "baa3999"), order = c(4, 5, rep(c(1, 2, 3, 4), 2)), journey = c(1, 1, 2, 2, 2, 2, 3, 3, 3, 3), values = c(50, 60, 10, 20, 15, 13, 28, 15, 22, 14))
# Get the observed values at order_t
observed_values <- df$values
# Create lagged column
df$prev_values <- lag(observed_values, 1)
# TODO
# Remove row if prev_values are the first observation on a new journey
#???
df_output <- df[c(2, 4:6, 8:10),]
df_output
Using base R
with duplicated
subset(df, duplicated(journey))
-output
tours order journey values
2 kuu122 5 1 60
4 ansc123123 2 2 20
5 ansc123123 3 2 15
6 ansc123123 4 2 13
8 baa3999 2 3 15
9 baa3999 3 3 22
10 baa3999 4 3 14