Search code examples
rexceldplyrlag

How to use the lag function correctly in r dplyr?


I get the below incorrect output for the last cell in column reSeq when running the R/dplyr code immediately beneath. The code produces a value of 8 in that last cell of column reSeq, when via the lag() function in the code it should instead produce a 7. What is wrong with my use of the lag() function? Also see image at the bottom that better explains what I am trying to do.

   Element Group eleCnt reSeq
   <chr>   <dbl>  <int> <int>
 1 R           0      1     1
 2 R           0      2     2
 3 X           0      1     1
 4 X           1      2     2
 5 X           1      3     2
 6 X           0      4     4
 7 X           0      5     5
 8 X           0      6     6
 9 B           0      1     1
10 R           0      3     3
11 R           2      4     4
12 R           2      5     4
13 X           3      7     7
14 X           3      8     7
15 X           3      9     8

library(dplyr)

myDF <- data.frame(
  Element = c("R","R","X","X","X","X","X","X","B","R","R","R","X","X","X"),
  Group = c(0,0,0,1,1,0,0,0,0,0,2,2,3,3,3)
)

myDF %>% 
  group_by(Element) %>%
    mutate(eleCnt = row_number()) %>%
  ungroup()%>%
  mutate(reSeq = eleCnt) %>%
  mutate(reSeq = ifelse(
    Element == lag(Element)& Group == lag(Group) & Group > 0, 
    lag(reSeq),
    eleCnt)
  )

The above is an attempted translation from Excel as show in this image below. I am new to R, migrating over from Excel. I am trying to replicate the column D "Target", highlighted in yellow with the formula to the right. The below shows the correct output, including the desired 7 in cell D17 which I can't replicate with the above R code.

enter image description here

Breaking the derivation of "Target" down into 2 columns, Step1 and Step2, highlighted in yellow and blue in the below image (Step2 below is same as Target in above image)(2 steps is how I got the R code working as shown in one of the solutions):

enter image description here


Solution

  • The below code works. I broke the Excel "Target" calculation into 2 steps in the 2nd image in the OP in order to reflect the step-wise R solution.

    library(dplyr)
    library(tidyr)
    
    myDF <- data.frame(
      Element = c("R","R","X","X","X","X","X","X","B","R","R","R","X","X","X"),
      Group = c(0,0,0,1,1,0,0,0,0,0,2,2,3,3,3)
    )
    
    myDF %>% 
      group_by(Element) %>%
        mutate(eleCnt = row_number()) %>%
      ungroup()%>%
      mutate(reSeq = ifelse(Group == 0 | Group != lag(Group), eleCnt,0)) %>%
      mutate(reSeq = na_if(reSeq, 0)) %>%
      group_by(Element) %>%
        fill(reSeq) %>%
      ungroup