Search code examples
rdplyrgroup-bylag

keep first row after calculating difference between rows with dplyr::lag


My question is similar to this OP and this OP, with a minor difference that seems to be overly complicated.

Example of my data:

ind_id   wt   date
1002     25   1987-07-27
1002     15   1988-05-05
2340     30   1987-03-18
2340     52   1989-08-15

I am calculating the difference between wt values after group_by(ind_id).

To do this:

df<-df %>% 
    group_by(ind_id) %>%
    mutate(mass_diff=(wt-lag(wt)) 

This gives me this output:

ind_id   wt   date        mass_diff
1002     15   1988-05-05  -10
2340     52   1989-08-15  22

But, the output I want should keep the first wt record, not the last.

Desired output:

ind_id   wt   date        mass_diff
1002     25   1988-05-05  -10
2340     30   1989-08-15  22

Note that the wt column is the only one I'd like to have maintained from the first row. (Keep in mind that this example is overly simplified and I am actually working with 18 rows).

Any suggestions (using dplyr) would be appreciated!


Solution

  • A possible solution:

    library(tidyverse)
    
    df <- structure(list(ind_id = c(1002, 1002, 2340, 2340), wt = c(25, 
    15, 30, 52), date = structure(c(6416, 6699, 6285, 7166), class = "Date")), row.names = c(NA, 
    -4L), class = "data.frame")
    
    df %>% 
      group_by(ind_id) %>%
      mutate(mass_diff = (wt-lag(wt))) %>% 
      mutate(wt = first(wt)) %>% 
      slice_tail %>% ungroup
    
    #> # A tibble: 2 × 4
    #>   ind_id    wt date       mass_diff
    #>    <dbl> <dbl> <date>         <dbl>
    #> 1   1002    25 1988-05-05       -10
    #> 2   2340    30 1989-08-15        22