Search code examples
rlinear-regressionrollapply

Beta estimation over panel data by group


I found some previous questions on this topic especially this R: Grouped rolling window linear regression with rollapply and ddply and R: Rolling / moving avg by group , however, both questions did not provide an exact solution for the problem that I am facing. I am currently trying to estimate CAPM beta over panel data using a linear regression. So I have different funds (in the example below I used 3 fund groups) for which I would like to calculate the betas separately and per row. To put this more abstract: I am trying to do a linear regression with a moving window by group to estimate the coefficient for every row based on the data in the window.

install.packages("zoo","dplyr")
library(zoo);library(dplyr)

# Create dataframe
fund <- as.numeric(c(1,1,1,1,1,1,1,1,3,3,3,3,3,3,2,2,2,2,2,2,2))
return<- as.numeric(c(1:21))
benchmark <- as.numeric(c(1,13,14,20,14,32,4,1,5,7,1,0,7,1,-2,1,6,-7,9,10,9))
riskfree<-as.numeric(c(1,5,1,2,1,6,4,7,5,-5,10,0,3,1,2,1,6,7,8,9,10))
date <- as.Date(c("2010-07-30","2010-08-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30",
              "2011-02-28","2010-07-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30",
              "2010-07-30","2010-08-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30"))
funddata<-data.frame(date,fund,return,benchmark,riskfree)

# Creating variables of interest
funddata["ret_riskfree"]<-as.numeric(funddata$return-funddata$riskfree)
funddata["benchmark_riskfree"]<-as.numeric(funddata$benchmark-funddata$riskfree)

I want to do a rolling regression over two columns df[6:7] for every group indicated by the column "fund". The calculation should be done separately so the first two rows in the beta column for every fund group will always show "NA". In the end I want to have a full dataframe with all fund groups and all beta values combined. I managed to come up with a new code that works but is pretty messy and it requires to order the data by fund & date before executing. I would welcome any suggestions on how to make it better.

 funddata <- funddata[order(funddata$fund, funddata$date),]   
 beta_func <- function(x, benchmark_riskfree, ret_riskfree) {
  a <- coef(lm(as.formula(paste(ret_riskfree, "~", benchmark_riskfree,-1)), 
           data = x))
  return(a)
}
beta_list<-list()
for (i in c(1:3)){beta_list[[paste(i, sep="_")]]<-    (rollapplyr(funddata[(funddata$fund==i),6:7], width = 3,
           FUN = function(x) beta_func(as.data.frame(x), "benchmark_riskfree" ,   "ret_riskfree"),
           by.column = FALSE,fill=NA))}
beta_list<-unlist(beta_list, recursive=FALSE)
funddata$beta<-beta_list

Solution

  • As I mentioned in the comment above, this solution might be a bit off since I'm not able to reproduce your desired output 100%. Still, the functionality of what you're trying to accomplish is there. Have a look at it and let me know if this is something you could use or I could develop further.

    EDIT: The code below does not reproduce the desired output as specified above, but turned out to be what the OP was looking for after all.

    Here goes:

    # Datasource
    fund <- as.numeric(c(1,1,1,1,1,1,1,1,3,3,3,3,3,3,2,2,2,2,2,2,2))
    return<- as.numeric(c(1:21))
    benchmark <- as.numeric(c(1,13,14,20,14,32,4,1,5,7,1,0,7,1,-2,1,6,-7,9,10,9))
    riskfree<-as.numeric(c(1,5,1,2,1,6,4,7,5,-5,10,0,3,1,2,1,6,7,8,9,10))
    date <- as.Date(c("2010-07-30","2010-08-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30",
                      "2011-02-28","2010-07-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30",
                      "2010-07-30","2010-08-31","2010-09-30","2010-10-31","2010-11-30","2010-12-31","2011-01-30"))
    funddata<-data.frame(date,fund,return,benchmark,riskfree)
    
    # Creating variables of interest
    funddata["ret_riskfree"]<-as.numeric(funddata$return-funddata$riskfree)
    funddata["benchmark_riskfree"]<-as.numeric(funddata$benchmark-funddata$riskfree)
    
    # Target check #################################################################
    # Subset last three rows in original dataframe
    df_check <- funddata[funddata$fund == 1,]
    df_check <- tail(df_check,3)
    
    # Run regression check
    mod_check <- lm(df_check$ret_riskfree~df_check$benchmark_riskfree)
    coef(mod_check)
    
    # My suggestion ################################################################
    # The following function takes three arguments:
    # 1. a dataframe, myDf
    # 2. a column that you'd like to myDf on
    # 3. a window length for a sliding window, myWin
    
    fun_rollreg <- function(myDf, subCol, varY, varX, myWin){
      df_main <- myDf
      
      # Make an empty data frame to store results in
      df_data <- data.frame()
      
      # Identify unique funds
      unFunds <- unique(unlist(df_main[subCol]))
      
      # Loop through your subset
      for (fundx in unFunds){
        
        # Subset
        df <- df_main
        df <- df[df$fund == fundx,]
        
        # Keep a copy of the original until later
        df_new <- df
        
        # Specify a container for your beta estimates
        betas <- c()
        
        # Specify window length
        wlength <- myWin
     
        # Retrieve some data dimensions to loop on
        rows = dim(df)[1]
        periods <- rows - wlength
        
        # Loop through each subset of the data
        # and run regression
        for (i in rows:(rows - periods)){
          
            # Split dataframe in subsets
            # according to the window length
            df1 <- df[(i-(wlength-1)):i,]
        
            # Run regression
            beta <- coef(lm(df1[[varY]]~df1[[varX]]))[2]
            
            # Keep regression ressults
            betas[[i]] <- beta
        }
        # Add regression data to dataframe
        df_new <- data.frame(df, betas)
        
        # Keep the new dataset for later concatenation
        df_data <- rbind(df_data, df_new)
      }   
      return(df_data)
    }
    
    # Run the function:
    df_roll <- fun_rollreg(myDf = funddata, subCol = 'fund',
                          varY <- 'ret_riskfree', varX <- 'benchmark_riskfree',
                          myWin = 3)
    # Show the results
    print(head(df_roll,8))
    

    For the first 8 rows in the new dataframe (fund = 1), this is the result:

      date         fund return benchmark riskfree ret_riskfree benchmark_riskfree       betas
    1 2010-07-30    1      1         1        1            0                  0          NA
    2 2010-08-31    1      2        13        5           -3                  8          NA
    3 2010-09-30    1      3        14        1            2                 13  0.10465116
    4 2010-10-31    1      4        20        2            2                 18  0.50000000
    5 2010-11-30    1      5        14        1            4                 13 -0.20000000
    6 2010-12-31    1      6        32        6            0                 26 -0.30232558
    7 2011-01-30    1      7         4        4            3                  0 -0.11538462
    8 2011-02-28    1      8         1        7            1                 -6 -0.05645161