Search code examples
rdataframeaggregatesubtractiontapply

How to subtract every previous rows from the lead row to every five rows in R?


I have a larger data frame that has multiple columns and thousands of rows. I want to replace the value of every lead row by subtracting the previous row value from the lead row for every five rows of the data frame. For example, the first value should retain its value, the second row should be: second row - first row. Similarly, the sixth row should retain its value, however, the seventh row would be seventh row - sixth row. Here is an example data frame

DF = data.frame(A= c(1:11), B = c(11:21))

The outputput should be like below

> Output
    A  B
1   1 11
2   1  1
3   1  1
4   1  1
5   1  1
6   6 16
7   1  1
8   1  1
9   1  1
10  1  1
11 11 21

Solution

  • One option would be to create a grouping variable and then do the transformation with diff which does the difference of adjacent elements of the columns selected in mutate_all (if only a subset of columns are needed either use mutate_if or mutate_at)

    library(dplyr) #v_0.8.3
    DF %>% 
       group_by(grp = as.integer(gl(n(), 5, n()))) %>% 
       mutate_all(~c(first(.), diff(.))) %>%
       ungroup %>%
       select(-grp)
    # A tibble: 11 x 2
    #       A     B
    #   <int> <int>
    # 1     1    11
    # 2     1     1
    # 3     1     1
    # 4     1     1
    # 5     1     1
    # 6     6    16
    # 7     1     1
    # 8     1     1
    # 9     1     1
    #10     1     1
    #11    11    21
    

    The above also gives a warning when we use mutate_all after group_by (previously it used to work - in the new versions, the correct syntax would be to use mutate_at

    DF %>% 
       group_by(grp = as.integer(gl(n(), 5, n()))) %>% 
       mutate_at(vars(-group_cols()), ~c(first(.), diff(.))) %>%
       ungroup %>%
       select(-grp)