Search code examples
rdplyrtime-seriesfinancerank

R: Calculate decile ranks by group


I have a dataframe crsppofo which contains monthly financial data with several variables. Of significance for my question are the following:

   PERMNO monthyear BetaShr
1:  85814 199501    0.5
2:  12345 199501    1.0
3:  85814 200002    1.5
4:  56789 200002    2.0
5:  12345 200002    2.5

"PERMNO" describes each individual stock, "monthyear" obviously shows the year and the month and "BetaShr" is my risk measure sorted in ascending order.

What I'm trying to accomplish is to assign decile ranks (1 to 10) according to "BetaShr" but grouped by "monthyear". The lowest decile rank should be assigned to the lowest 10% of "BetaShr" of each month. The output should look like the following:

   PERMNO monthyear BetaShr BetaDecileRank
1:  85814 199501    0.5     1
2:  12345 199501    1.0     10
3:  85814 200002    1.5     1
4:  56789 200002    2.0     5
5:  12345 200002    2.5     10

Of course this is just a simplified example where only three deciles are assigned to give you an example of my desired output (assuming a range of "BetaShr" between 0.5 and 1.0 for 199501 and a range of 1.5 to 2.5 for 200002). You get the idea.

Through research I came up with this code:

library(purrr)
library(StatMeasures)
library(dplyr)
crsppofo <- crsppofo %>%
  split(crsppofo$monthyear) %>%
  map_df(~ mutate(., BetaDecileRank = decile(crsppofo$BetaShr)))

resulting in the error:

Error: Column `BetaDecileRank` must be length 2524 (the group size) or one, not 896935

Any help regarding this problem would be greatly appreciated. Feel free to improve my code or propose a completely different approach. If you need any further information let me know through the comments. I am also open for improvements regarding my question and my presence at SO as I am just a newbie in this forum as well as in R.


Solution

  • The issue is that inside the split group, the decile is applied on the whole dataset column 'BetaShr' instead of the rows from that splitted dataset

    ... %>%
        map_df(~ mutate(., BetaDecileRank = decile(crsppofo$BetaShr)))
                                                   ^^^^
    

    It should be

    decile(.$BetaShr)
    

    -fullcode

    library(dplyr)
    library(purrr)
    library(StatMeasures)
    crsppofo <- crsppofo %>%
                  split(crsppofo$monthyear) %>%
                  map_df(~ mutate(., BetaDecileRank = decile(.$BetaShr)))
    crsppofo
    #  PERMNO monthyear BetaShr BetaDecileRank
    #1  85814    199501     0.5              1
    #2  12345    199501     1.0             10
    #3  85814    200002     1.5              1
    #4  56789    200002     2.0              5
    #5  12345    200002     2.5             10
    

    Note that we don't need to split and then loop using map. Instead, it can be done by a group_by/mutate option

    crsppofo %>% 
       group_by(monthyear) %>% 
       mutate(BetaDecileRank = decile(BetaShr))
    # A tibble: 5 x 4
    # Groups:   monthyear [2]
    #  PERMNO monthyear BetaShr BetaDecileRank
    #   <int>     <int>   <dbl>          <int>
    #1  85814    199501     0.5              1
    #2  12345    199501     1               10
    #3  85814    200002     1.5              1
    #4  56789    200002     2                5
    #5  12345    200002     2.5             10
    

    data

    crsppofo <- structure(list(PERMNO = c(85814L, 12345L, 85814L, 56789L, 12345L
    ), monthyear = c(199501L, 199501L, 200002L, 200002L, 200002L), 
        BetaShr = c(0.5, 1, 1.5, 2, 2.5)), class = "data.frame",
        row.names = c("1:", 
    "2:", "3:", "4:", "5:"))