Search code examples
rtime-seriespanel-data

How do I Difference Panel Data in R


I am wondering if there is any easy R commands or packages that will all allow me to easily add variables to data.frames which are the "difference" or change of over time of those variables.

If my data looked like this:

set.seed(1)
MyData <- data.frame(Day=0:9 %% 5+1, 
                 Price=rpois(10,10),
                 Good=rep(c("apples","oranges"), each=5))
MyData

   Day Price    Good
1    1     8  apples
2    2    10  apples
3    3     7  apples
4    4    11  apples
5    5    14  apples
6    1    12 oranges
7    2    11 oranges
8    3     9 oranges
9    4    14 oranges
10   5    11 oranges

Then after "first differencing" the price variable, my data would look like this.

   Day Price    Good P1d
1    1     8  apples  NA
2    2    10  apples   2
3    3     7  apples  -3
4    4    11  apples   4
5    5    14  apples   3
6    1    12 oranges  NA
7    2    11 oranges  -1
8    3     9 oranges  -2
9    4    14 oranges   5
10   5    11 oranges  -3

Solution

  • ave

    transform(MyData, P1d = ave(Price, Good, FUN = function(x) c(NA, diff(x))))
    

    ave/gsubfn

    The last solution can be shorteneed slightly using fn$ in the gsubfn package:

    library(gsubfn)
    transform(MyData, P1d = fn$ave(Price, Good, FUN = ~ c(NA, diff(x))))
    

    dplyr

    library(dplyr)
    
    MyData %>% 
      group_by(Good) %>% 
      mutate(P1d = Price - lag(Price)) %>% 
      ungroup
    

    data.table

    library(data.table)
    
    dt <- data.table(MyData)
    dt[, P1d := c(NA, diff(Price)), by = Good]
    

    Update

    dplyr now uses %>% instead of %.% .