Search code examples
rdataframelag

subtract value from previous row by group


In R, let's say I have this data frame:

Data
id      date        value
2380    10/30/12    21.01
2380    10/31/12    22.04
2380    11/1/12     22.65
2380    11/2/12     23.11
20100   10/30/12    35.21
20100   10/31/12    37.07
20100   11/1/12     38.17
20100   11/2/12     38.97
20103   10/30/12    57.98
20103   10/31/12    60.83 

And I want to subtract the previous value from the current value, by group ID date, to create this:

id      date        value   diff
2380    10/30/12    21.01   0
2380    10/31/12    22.04   1.03
2380    11/1/12     22.65   0.61
2380    11/2/12     23.11   0.46
20100   10/30/12    35.21   0
20100   10/31/12    37.07   1.86
20100   11/1/12     38.17   1.1
20100   11/2/12     38.97   0.8
20103   10/30/12    57.98   0
20103   10/31/12    60.83   2.85

Solution

  • With dplyr:

    library(dplyr)
    
    data %>%
        group_by(id) %>%
        arrange(date) %>%
        mutate(diff = value - lag(value, default = first(value)))
    

    For clarity you can arrange by date and grouping column (as per comment by lawyer)

    data %>%
        group_by(id) %>%
        arrange(date, .by_group = TRUE) %>%
        mutate(diff = value - lag(value, default = first(value)))
    

    or lag with order_by:

    data %>%
        group_by(id) %>%
        mutate(diff = value - lag(value, default = first(value), order_by = date))
    

    With data.table:

    library(data.table)
    
    dt <- as.data.table(data)
    setkey(dt, id, date)
    dt[, diff := value - shift(value, fill = first(value)), by = id]