Say I have the following data frame:
df <- data.frame(store = LETTERS[1:8],
sales = c( 9, 128, 54, 66, 23, 132, 89, 70),
successRate = c(.80, .25, .54, .92, .85, .35, .54, .46))
I want to rank the stores according to successRate
, with ties going to the store with more sales, so first I do this (just to make visualization easier):
df <- df[order(-df$successRate, -df$sales), ]
In order to actually create a ranking variable, I do the following:
df$rank <- ave(df$successRate, FUN = function(x) rank(-x, ties.method='first'))
So df
looks like this:
store sales successRate rank
4 D 66 0.92 1
5 E 23 0.85 2
1 A 9 0.80 3
7 G 89 0.54 4
3 C 54 0.54 5
8 H 70 0.46 6
6 F 132 0.35 7
2 B 128 0.25 8
The problem is I don't want small stores to be part of the ranking. Specifically, I want stores with less than 50 sales not to be ranked. So this is how I define df$rank
instead:
df$rank <- ifelse(df$sales < 50, NA,
ave(df$successRate, FUN = function(x) rank(-x, ties.method='first')))
The problem is that even though this correctly removes stores E and A, it doesn't reassign the rankings they were occupying. df
looks like this now:
store sales successRate rank
4 D 66 0.92 1
5 E 23 0.85 NA
1 A 9 0.80 NA
7 G 89 0.54 4
3 C 54 0.54 5
8 H 70 0.46 6
6 F 132 0.35 7
2 B 128 0.25 8
I've experimented with conditions inside and outside ave()
, but I can'r get R to do what I want! How can I get it to rank the stores like this?
store sales successRate rank
4 D 66 0.92 1
5 E 23 0.85 NA
1 A 9 0.80 NA
7 G 89 0.54 2
3 C 54 0.54 3
8 H 70 0.46 4
6 F 132 0.35 5
2 B 128 0.25 6
Super easy to do with data.table
:
library(data.table)
dt = data.table(df)
# do the ordering you like (note, could also use setkey to do this faster)
dt = dt[order(-successRate, -sales)]
dt[sales >= 50, rank := .I]
dt
# store sales successRate rank
#1: D 66 0.92 1
#2: E 23 0.85 NA
#3: A 9 0.80 NA
#4: G 89 0.54 2
#5: C 54 0.54 3
#6: H 70 0.46 4
#7: F 132 0.35 5
#8: B 128 0.25 6
If you must do it in data.frame
, then after your preferred order, run:
df$rank <- NA
df$rank[df$sales >= 50] <- seq_len(sum(df$sales >= 50))