Search code examples
rdiffzoo

Use zoo to calculate difference by group


I have the following data, how to calculate the difference between height and height[0] grouped by id? E.g. group by id, then the first heightdiff will be 0, then next will be height[1]-height[0], etc. thx. Such as, by using zoo package or diff

structure(list(id = c(80006L, 80006L, 80006L, 80006L, 80006L, 
80006L, 80006L, 80006L, 80006L, 80006L, 80006L, 80006L, 80006L, 
80006L, 80006L, 80016L, 80016L, 80016L, 80016L, 80016L, 80016L, 
80024L, 80024L, 80024L, 80024L, 80024L, 80024L, 80024L, 80024L, 
80024L), group = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), height = c(97.12, 101.35, 102.39, 103.49, 101.64, 
105.88, 109.31, 107.37, 115.08, 116.83, 119.03, 117.01, 122.57, 
132.27, 162.08, 105.01, 108.13, 115.58, 122.46, 130.33, 148.52, 
89.78, 95.27, 98.99, 98.55, 100.84, 108.46, 109.49, 115.75, 118.52
)), row.names = c(NA, 30L), class = "data.frame")

Solution

  • R uses 1 origin indexing (rather than 0). We can use ave to subtract the first height from all other heights within id. No packages are used.

    transform(dat, hdiff = ave(height, id, FUN = function(x) x - x[1]))
    

    Alternately, with dplyr we can write:

    library(dplyr) # version 1.1.0 or later
    
    dat %>%
      mutate(hdiff = height - first(height), .by = id)