Search code examples
rdataframedplyrlead

How to flag the last row of a data frame group?


Suppose we start with the below dataframe df:

 ID <- c(1, 1, 1, 5, 5)
 Period <- c(1,2,3,1,2)
 Value <- c(10,12,11,4,6)
 df <- data.frame(ID, Period, Value)

      ID Period Value
    1  1      1    10
    2  1      2    12
    3  1      3    11
    4  5      1     4
    5  5      2     6

Now using dplyr I add a "Calculate" column that multiplies Period and Value of each row, giving me the following:

> df %>% mutate(Calculate = Period * Value)

  ID Period Value Calculate
1  1      1    10        10
2  1      2    12        24
3  1      3    11        33
4  5      1     4         4
5  5      2     6        12

I'd like to modify the above "Calculate" to give me a value of 0, when reaching the last row for a given ID, so that the data frame output looks like:

  ID Period Value Calculate
1  1      1    10        10
2  1      2    12        24
3  1      3    11         0
4  5      1     4         4
5  5      2     6         0

I was going to use the lead() function to peer at the next row to see if the ID changes but wasn't sure that happens when reaching the end of the data frame.

How could this be accomplished using dplyr?


Solution

  • You can group_by ID and replace the last row for each ID with 0.

    library(dplyr)
    
    df %>% 
      mutate(Calculate = Period * Value) %>%
      group_by(ID) %>%
      mutate(Calculate = replace(Calculate, n(), 0)) %>%
      ungroup
    
    #     ID Period Value Calculate
    #  <dbl>  <dbl> <dbl>     <dbl>
    #1     1      1    10        10
    #2     1      2    12        24
    #3     1      3    11         0
    #4     5      1     4         4
    #5     5      2     6         0