Search code examples
rdata.tablepanel

Compute differences relative to start date by group


I have a data frame of this sort

year <- c(2001, 2001, 2001, 2006, 2006, 2006, 2007, 2007, 2007)
group <- c("a", "b", "c", "a", "b", "c", "a", "b", "c")
value <- c(10, 50, 100, 20, 5, 200, 25, 50, 250)
mydf <- data.frame(year, group, value)    

I would like to compute differences and proportional variations in value for years 2006 and 2007 relative to year 2001. I understand how first differences by group could be computed with data.table as in

require(data.table)
mydf <- data.table(mydf)

mydf[, D.value:=c(NA, diff(value)), by=group]
mydf[, PD.value:=c(NA, diff(value)/value[-.N]), by=group] 

mydf <- data.frame(mydf)

Or how differences relative to start date can be computed in a time series as explained here. But I can't seem to understand how to compute differences in value relative to a base year. Any help would be much appreciated.


Solution

  • mydf[, diffs := value - value[year == 2001], by = group]
    mydf[, propdiffs := diffs / value[year == 2001], by = group]
    #   year group value diffs propdiffs
    #1: 2001     a    10     0       0.0
    #2: 2001     b    50     0       0.0
    #3: 2001     c   100     0       0.0
    #4: 2006     a    20    10       1.0
    #5: 2006     b     5   -45      -0.9
    #6: 2006     c   200   100       1.0
    #7: 2007     a    25    15       1.5
    #8: 2007     b    50     0       0.0
    #9: 2007     c   250   150       1.5