I hope you are able to advise in how effectively use the dplyr
package.
I have a data.frame
with three columns, country, company name and satisfaction score.
Ultimately, I would like to find the nth company for every country based on satisfaction indicator.
So far, I have done the following:
ordered_needed_data_ha %>% group_by(Country) %>% summarise(value = nth(`Satisfaction Score`, 10, order_by = Country))
#This gives me the 10th highest satisfaction score in the given country.
This evaluates to, after piping head()
Country value
<chr> <dbl>
1 AL NA
2 AT 15.7
3 BG 17.1
4 FR 14.9
5 IT 13.3
How could I add the company name with the given value for the country ?
Adding an additional grouping argument does not work, like group_by(Country, Company)
So the desired outcome would be something like:
Country Company value
<chr> <chr> <dbl>
1 AL CompanyX NA
2 AT CompanyX 15.7
3 BG CompanyX 17.1
4 FR CompanyX 14.9
5 IT CompanyX 13.3
I am not a regular R user, I would appreciate your help. Thanks !
It seems like the use of summarize
is not necessary. I think you're effectively just doing a slice
operation.
For instance, mimicking your code:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
summarize(value = nth(disp, 2))
# # A tibble: 3 × 2
# cyl value
# <dbl> <dbl>
# 1 4 147.
# 2 6 160
# 3 8 360
We can get the same rows (note the identical disp
values) and as much of the remaining columns using slice
:
mtcars %>%
group_by(cyl) %>%
slice(2) %>%
ungroup()
# # A tibble: 3 × 11
# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
If you really want to continue using summarize
(for unseen reasons, perhaps), then you can use nth
still:
mtcars %>%
group_by(cyl) %>%
summarize(value = nth(disp, 2), value2 = nth(hp, 2))
# # A tibble: 3 × 3
# cyl value value2
# <dbl> <dbl> <dbl>
# 1 4 147. 62
# 2 6 160 110
# 3 8 360 245
or with multiple columns and across
, a little more robust perhaps?
mtcars %>%
group_by(cyl) %>%
summarize(across(c(disp, hp), ~ nth(., 2)))
# # A tibble: 3 × 3
# cyl disp hp
# <dbl> <dbl> <dbl>
# 1 4 147. 62
# 2 6 160 110
# 3 8 360 245