I have a dataframe crsppofo
which contains monthly financial data with several variables. Of significance for my question are the following:
PERMNO monthyear BetaShr
1: 85814 199501 0.5
2: 12345 199501 1.0
3: 85814 200002 1.5
4: 56789 200002 2.0
5: 12345 200002 2.5
"PERMNO"
describes each individual stock, "monthyear"
obviously shows the year and the month and "BetaShr"
is my risk measure sorted in ascending order.
What I'm trying to accomplish is to assign decile ranks (1 to 10) according to "BetaShr"
but grouped by "monthyear"
. The lowest decile rank should be assigned to the lowest 10% of "BetaShr"
of each month. The output should look like the following:
PERMNO monthyear BetaShr BetaDecileRank
1: 85814 199501 0.5 1
2: 12345 199501 1.0 10
3: 85814 200002 1.5 1
4: 56789 200002 2.0 5
5: 12345 200002 2.5 10
Of course this is just a simplified example where only three deciles are assigned to give you an example of my desired output (assuming a range of "BetaShr"
between 0.5 and 1.0 for 199501 and a range of 1.5 to 2.5 for 200002). You get the idea.
Through research I came up with this code:
library(purrr)
library(StatMeasures)
library(dplyr)
crsppofo <- crsppofo %>%
split(crsppofo$monthyear) %>%
map_df(~ mutate(., BetaDecileRank = decile(crsppofo$BetaShr)))
resulting in the error:
Error: Column `BetaDecileRank` must be length 2524 (the group size) or one, not 896935
Any help regarding this problem would be greatly appreciated. Feel free to improve my code or propose a completely different approach. If you need any further information let me know through the comments. I am also open for improvements regarding my question and my presence at SO as I am just a newbie in this forum as well as in R.
The issue is that inside the split
group, the decile
is applied on the whole dataset column 'BetaShr' instead of the rows from that splitted dataset
... %>%
map_df(~ mutate(., BetaDecileRank = decile(crsppofo$BetaShr)))
^^^^
It should be
decile(.$BetaShr)
-fullcode
library(dplyr)
library(purrr)
library(StatMeasures)
crsppofo <- crsppofo %>%
split(crsppofo$monthyear) %>%
map_df(~ mutate(., BetaDecileRank = decile(.$BetaShr)))
crsppofo
# PERMNO monthyear BetaShr BetaDecileRank
#1 85814 199501 0.5 1
#2 12345 199501 1.0 10
#3 85814 200002 1.5 1
#4 56789 200002 2.0 5
#5 12345 200002 2.5 10
Note that we don't need to split
and then loop using map
. Instead, it can be done by a group_by/mutate
option
crsppofo %>%
group_by(monthyear) %>%
mutate(BetaDecileRank = decile(BetaShr))
# A tibble: 5 x 4
# Groups: monthyear [2]
# PERMNO monthyear BetaShr BetaDecileRank
# <int> <int> <dbl> <int>
#1 85814 199501 0.5 1
#2 12345 199501 1 10
#3 85814 200002 1.5 1
#4 56789 200002 2 5
#5 12345 200002 2.5 10
crsppofo <- structure(list(PERMNO = c(85814L, 12345L, 85814L, 56789L, 12345L
), monthyear = c(199501L, 199501L, 200002L, 200002L, 200002L),
BetaShr = c(0.5, 1, 1.5, 2, 2.5)), class = "data.frame",
row.names = c("1:",
"2:", "3:", "4:", "5:"))