Search code examples
rpanelrankweighted-average

Calculating weighted average based on criteria


I have a panel data set, where I have tens of years of data for thousands of assets.

There are also rows with NA values in RET/MV/RANK_LAG, which I would like to ignore.

I have got my data to this data frame, where I have months (in class(date)), asset codes, monthly returns and market capitalization. In addition. I have decile ranks that change every month.

Here is an example dataset. Instead of deciles I have just two ranks for the sake of this example dataset:

DATE CODE RET MV RANK_LAG
2000-01-01 999999 0.02 10 1
2000-01-01 2222X2 -0.01 20 2
2000-01-01 123456 0.05 30 2
2000-01-01 5555Y5 0.00 5 1
2000-02-01 999999 0.05 10 2
2000-02-01 2222X2 0.10 20 2
2000-02-01 123456 -0.1 30 1
2000-02-01 5555Y5 -0.1 5 1

What I would like to do is to calculate a weighted average return (using MV as weights) for each RANK_LAG, for each DATE.

So a desired output would be:

DATE RANK WEIGHTED_RET
2000-01-01 1 0.013
2000-01-01 2 0.026
2000-02-01 1 -0.100
2000-02-01 2 0.083

Solution

  • Use base function weighted.mean in a dplyr pipe.

    df1 <- read.table(text = "DATE  CODE    RET     MV  RANK_LAG
    2000-01-01  999999  0.02    10  1
    2000-01-01  2222X2  -0.01   20  2
    2000-01-01  123456  0.05    30  2
    2000-01-01  5555Y5  0.00    5   1
    2000-02-01  999999  0.05    10  2
    2000-02-01  2222X2  0.10    20  2
    2000-02-01  123456  -0.1    30  1
    2000-02-01  5555Y5  -0.1    5   1
    ", header = TRUE, colClasses = c("Date", "character", rep("numeric", 3)))
    
    suppressPackageStartupMessages(
      library(dplyr)
    )
    
    df1 %>%
      group_by(DATE, RANK_LAG) %>%
      summarise(WEIGHTED_RET = weighted.mean(RET, MV, na.rm = TRUE), .groups = "drop")
    #> # A tibble: 4 × 3
    #>   DATE       RANK_LAG WEIGHTED_RET
    #>   <date>        <dbl>        <dbl>
    #> 1 2000-01-01        1       0.0133
    #> 2 2000-01-01        2       0.026 
    #> 3 2000-02-01        1      -0.1   
    #> 4 2000-02-01        2       0.0833
    

    Created on 2023-02-07 with reprex v2.0.2