Here is my example data frame with the expected output.
data.frame(index=c("3435pear","3435grape","3435apple","3435avocado","3435orange","3435kiwi","3436grapefruit","3436apple","3436banana","3436grape","3437apple","3437grape","3437avocado","3437orange","3438apple","3439apple","3440apple"),output=c("na","na","na","na","na","na","na","na","na","na","na","na","na","na","3435apple","3436apple","3437apple"))
index output
1 3435pear na
2 3435grape na
3 3435apple na
4 3435avocado na
5 3435orange na
6 3435kiwi na
7 3436grapefruit na
8 3436apple na
9 3436banana na
10 3436grape na
11 3437apple na
12 3437grape na
13 3437avocado na
14 3437orange na
15 3438apple 3435apple
16 3439apple 3436apple
17 3440apple 3437apple
I want to match the fruit that is third from the bottom as I go down the column. If there are not three previous fruits it should return NA. Once the 4th apple appears it matches the apple 3 before it, then the 5th apple appears it matches the one 3 before that one, and so on.
I was trying to use rollapply, match, and tail to make this work, but I don't know how to reference the current row for the matching. In excel I would use the large, if, and row functions to do this. Excel makes my computer grind for hours to calculate everything and I know R could do this in minutes(seconds?).
You can do this:
library(dplyr)
df %>%
mutate(fruit = gsub("[0-9]", "", index)) %>%
group_by(fruit) %>%
mutate(new_output = lag(index, 3)) %>%
select(-fruit) %>%
ungroup
By each group of fruit
, your new_output
gives you the index
value lagged by 3. I preserved the output
column and saved my results in new_output
so that you can compare.