Search code examples
rdplyrtibblesummarize

R [Dplyr]: Adding column after applying Summary function


I hope you are able to advise in how effectively use the dplyr package.

I have a data.frame with three columns, country, company name and satisfaction score.

Ultimately, I would like to find the nth company for every country based on satisfaction indicator.

So far, I have done the following:

ordered_needed_data_ha %>% group_by(Country) %>% summarise(value = nth(`Satisfaction Score`, 10, order_by = Country))
#This gives me the 10th highest satisfaction score in the given country. 

This evaluates to, after piping head()

Country value
   <chr> <dbl>
 1 AL     NA  
 2 AT     15.7
 3 BG     17.1
 4 FR     14.9
 5 IT     13.3

How could I add the company name with the given value for the country ? Adding an additional grouping argument does not work, like group_by(Country, Company)

So the desired outcome would be something like:

Country Company value
   <chr> <chr> <dbl>
 1 AL     CompanyX NA 
 2 AT     CompanyX 15.7
 3 BG     CompanyX 17.1
 4 FR     CompanyX 14.9
 5 IT     CompanyX 13.3 

I am not a regular R user, I would appreciate your help. Thanks !


Solution

  • It seems like the use of summarize is not necessary. I think you're effectively just doing a slice operation.

    For instance, mimicking your code:

    library(dplyr)
    mtcars %>%
      group_by(cyl) %>%
      summarize(value = nth(disp, 2))
    # # A tibble: 3 × 2
    #     cyl value
    #   <dbl> <dbl>
    # 1     4  147.
    # 2     6  160 
    # 3     8  360 
    

    We can get the same rows (note the identical disp values) and as much of the remaining columns using slice:

    mtcars %>%
      group_by(cyl) %>%
      slice(2) %>%
      ungroup()
    # # A tibble: 3 × 11
    #     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
    #   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    # 1  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
    # 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
    # 3  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
    

    If you really want to continue using summarize (for unseen reasons, perhaps), then you can use nth still:

    mtcars %>%
      group_by(cyl) %>%
      summarize(value = nth(disp, 2), value2 = nth(hp, 2))
    # # A tibble: 3 × 3
    #     cyl value value2
    #   <dbl> <dbl>  <dbl>
    # 1     4  147.     62
    # 2     6  160     110
    # 3     8  360     245
    

    or with multiple columns and across, a little more robust perhaps?

    mtcars %>%
      group_by(cyl) %>%
      summarize(across(c(disp, hp), ~ nth(., 2)))
    # # A tibble: 3 × 3
    #     cyl  disp    hp
    #   <dbl> <dbl> <dbl>
    # 1     4  147.    62
    # 2     6  160    110
    # 3     8  360    245