Search code examples
rwrds-compusatwrds

subtract rows from data frame in R with two factors using zoo or better package


I have a data.frame ordered by companies ("gvkey") and calendar quarters ("datafqtr") and and additional variable (for example, "day")

gvkey datafqtr  dvy
1  1001   1983Q1 0.50
2  1001   1983Q2 1.50
3  1001   1983Q3 2.00
4  1001   1983Q4 4.50
5  1002   1984Q1 0.00
6  1002   1984Q2 0.00
7  1002   1984Q3 0.10
8  1002   1984Q4 0.45

What would be the best way to do the following in R? (not necessarily using zoo?)

For this data.frame, for each company and for each year, subtract the 3rd quarter from the fourth quarter, then the second quarter from the first quarter and finally, the first quarter from the second quarter.

The output should look like this:

gvkey datafqtr  dvy
1  1001   1983Q1 0.50
2  1001   1983Q2 1.00
3  1001   1983Q3 0.50
4  1001   1983Q4 2.50
5  1002   1984Q1 0.00
6  1002   1984Q2 0.00
7  1002   1984Q3 0.10
8  1002   1984Q4 0.35

I have thought to do it the following way:

Use the zoo package and lag each variable once, while having the company code ("gvkey") and calendar quarter (datafqtr) as factors, and then subtract the observation.

data<-zoo(data)

data<-data[order(data[,'gvkey'],data[,'datafqtr']),]

data$lagfqtr<-lag(data$datafqtr,-1,na.pad=TRUE)
data$laggvkey<-lag(data$gvkey,-1,na.pad=TRUE)

data$lagdvy<-lag(data$dvy,-1,na.pad=TRUE)

I then convert the zoo to a regular data.frame and subtract the lagged variable from the non-lagged variable:

data<-as.data.frame(data)

data[,] <- lapply(data[,], function(x) as.numeric(as.character(x)))

indice <- indice*1

data$divq <- data$dvy - (data$lagdvy * indice)

I am not sure whether this is doing what I want,

thank you and appreciate all your help,


Solution

  • Using library dplyr, you can do something like this (subtract lagged values from current values):

    library(dplyr)
    df %>%
      group_by(gvkey) %>%
      mutate(dvy = dvy - lag(dvy, default = 0))
    

    Output is as follows and desired:

    Source: local data frame [8 x 3]
    Groups: gvkey [2]
    
      gvkey datafqtr   dvy
      <int>    <chr> <dbl>
    1  1001   1983Q1  0.50
    2  1001   1983Q2  1.00
    3  1001   1983Q3  0.50
    4  1001   1983Q4  2.50
    5  1002   1984Q1  0.00
    6  1002   1984Q2  0.00
    7  1002   1984Q3  0.10
    8  1002   1984Q4  0.35
    

    Note: this will work if data is already ordered. Otherwise, you will simply need to introduce arrange function into the above pipe and everything should fall in place.