I have a df of players and their penalty points. It looks something like this:
# playerID penalty
# A 0
# B 2
# A 4
# C 2
# A 0
# B 0
# B 2
I want to get a "player rank" which depends on the "penalty" and number of played games (A three games, C one game). If there is a player X with average penalty = 3 over 10 games and player Y with average penalty = 3 over 4 games then the "player rank" of the player X should be lower then the "player rank" of the player Y.
My idea was to divide the average penalty of player i by sqrt(n(i)), where n(i) is number of games played by player i. Second idea was to divide average penalty by log(n(i)+1).
Do you think its the right approach? Was somebody dealing with a similar problem?
Thanks for your responses.
I think you want to transfer this post to Cross Validated. Anyway, this is the solution to your problem via data.table.
library(data.table)
df = data.frame( playerID = c( "A", "B", "A", "C", "A", "B", "B"),
penalty = c( 0, 2, 4, 2, 0, 0, 2 ) )
setDT(df)
df[ , games := .N, by = playerID ]
df[ , mean_penalty := mean( penalty ), by = playerID ]
df[ , `:=` ( rank1 = mean_penalty / sqrt(games),
rank2 = mean_penalty / log(games+1) ) ]
final = unique(df, by = "playerID" )
> final
playerID penalty games mean_penalty rank1 rank2
1: A 0 3 1.333333 0.7698004 0.9617967
2: B 2 3 1.333333 0.7698004 0.9617967
3: C 2 1 2.000000 2.0000000 2.8853901