I asked a similar question some time ago but then subsequently realized that my problem was in fact more complex. Apologies for asking again.
df <- data.frame(
comp_name = c("A","A","B","B","A","A","B","B","C","C","D","D","C","C","D","D"),
country = c("US","US","US","US","US","US","US","US","France","France","France","France","France","France","France","France"),
year = c("2018","2018","2018","2018","2019","2019","2019","2019","2018","2018","2018","2018","2019","2019","2019","2019"),
type = c("profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue","profit", "revenue"),
value = c(10,20,30,40,20,30,40,50,140,150,120,130,100,110,80,90)
)
df:
comp_name country year type value
1 A US 2018 profit 10
2 A US 2018 revenue 20
3 B US 2018 profit 30
4 B US 2018 revenue 40
5 A US 2019 profit 20
6 A US 2019 revenue 30
7 B US 2019 profit 40
8 B US 2019 revenue 50
9 C France 2018 profit 140
10 C France 2018 revenue 150
11 D France 2018 profit 120
12 D France 2018 revenue 130
13 C France 2019 profit 100
14 C France 2019 revenue 110
15 D France 2019 profit 80
16 D France 2019 revenue 90
I want to add a rank column like so:
comp_name country year type value rank
1 A US 2018 profit 10
2 A US 2018 revenue 20
3 B US 2018 profit 30
4 B US 2018 revenue 40
5 A US 2019 profit 20 2
6 A US 2019 revenue 30
7 B US 2019 profit 40 1
8 B US 2019 revenue 50
9 C France 2018 profit 140
10 C France 2018 revenue 150
11 D France 2018 profit 120
12 D France 2018 revenue 130
13 C France 2019 profit 100 1
14 C France 2019 revenue 110
15 D France 2019 profit 80 2
16 D France 2019 revenue 90
I want to consider only profits in 2019 and rank the companies by their profits in each country.
When I previously asked the question, @KarthikS provided the following solution:
library(dplyr)
df %>% group_by(country) %>% mutate(rank = rank(desc(value)))
However, I've now added more variables (year and type), which I also want to consider.
Please do let me know if the question is unclear. I am new to R, and any help would be greatly appreciated. Thank you!
Calculate the ranks for all years, all types, all years, and then remove the values you don't need. (Or keep them.)
library(dplyr)
df %>%
group_by(country, year, type) %>%
mutate(rank = rank(desc(value))) %>%
ungroup() %>%
mutate(rank = if_else(year == 2019 & type == "profit", rank, NA_real_))
# # A tibble: 16 x 6
# comp_name country year type value rank
# <chr> <chr> <chr> <chr> <dbl> <dbl>
# 1 A US 2018 profit 10 NA
# 2 A US 2018 revenue 20 NA
# 3 B US 2018 profit 30 NA
# 4 B US 2018 revenue 40 NA
# 5 A US 2019 profit 20 2
# 6 A US 2019 revenue 30 NA
# 7 B US 2019 profit 40 1
# 8 B US 2019 revenue 50 NA
# 9 C France 2018 profit 140 NA
# 10 C France 2018 revenue 150 NA
# 11 D France 2018 profit 120 NA
# 12 D France 2018 revenue 130 NA
# 13 C France 2019 profit 100 1
# 14 C France 2019 revenue 110 NA
# 15 D France 2019 profit 80 2
# 16 D France 2019 revenue 90 NA