Search code examples
rexceltailrollapply

How to match third (or whatever) from the bottom in a rolling fashion in R?


Here is my example data frame with the expected output.

data.frame(index=c("3435pear","3435grape","3435apple","3435avocado","3435orange","3435kiwi","3436grapefruit","3436apple","3436banana","3436grape","3437apple","3437grape","3437avocado","3437orange","3438apple","3439apple","3440apple"),output=c("na","na","na","na","na","na","na","na","na","na","na","na","na","na","3435apple","3436apple","3437apple"))

                index    output
1        3435pear        na
2       3435grape        na
3       3435apple        na
4     3435avocado        na
5      3435orange        na
6        3435kiwi        na
7  3436grapefruit        na
8       3436apple        na
9      3436banana        na
10      3436grape        na
11      3437apple        na
12      3437grape        na
13    3437avocado        na
14     3437orange        na
15      3438apple 3435apple
16      3439apple 3436apple
17      3440apple 3437apple

I want to match the fruit that is third from the bottom as I go down the column. If there are not three previous fruits it should return NA. Once the 4th apple appears it matches the apple 3 before it, then the 5th apple appears it matches the one 3 before that one, and so on.

I was trying to use rollapply, match, and tail to make this work, but I don't know how to reference the current row for the matching. In excel I would use the large, if, and row functions to do this. Excel makes my computer grind for hours to calculate everything and I know R could do this in minutes(seconds?).


Solution

  • You can do this:

    library(dplyr) 
    
    df %>% 
      mutate(fruit = gsub("[0-9]", "", index)) %>% 
      group_by(fruit) %>% 
      mutate(new_output = lag(index, 3)) %>% 
      select(-fruit) %>%
      ungroup
    

    By each group of fruit, your new_output gives you the index value lagged by 3. I preserved the output column and saved my results in new_output so that you can compare.