Search code examples
rdataframedplyrtidyverse

Selecting top 5 rows along with keeping a certain row fixed


In my data frame (as below) I want to extract the top 5 brands by sales. I want to make sure that if my brand appears in top 5 then 5 values get picked, else top 5 + row for my brand gets picked (6 in total).

dff4 <- data.frame(stringsAsFactors = FALSE, check.names = FALSE, 
    Region = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
        "C", "C", "C", "C", "C"), Brand = c("B1", "B2", "B3", 
        "B4", "B5", "B6", "B7", "B8", "B9", "B10", "B11", "B12", 
        "B13", "B14", "B15"), Sales = c(2923, 2458, 2812, 2286, 
        1683, 1085, 2805, 3214, 1059, 1866, 3280, 2481, 2016, 1230, 1763))

Then I sort my data on Sales using the below code:

dff4 <- dff4[order(-dff4$Sales), ]

Now I need to only keep the top 5 values by sales in my data frame. This I can do using the below lines.

dff4 <- head(dff4, 5)

However, I also want to make sure that my own brand's data remain in the table, irrespective of it is appearing in top 5 or not. For example when I sort the above table, I can see that Brand "B4" does not appear in top 5. If I want to make sure that my code pick only top 5 if "B4" appears there else pick the top 5 and brand "B4" as the 6th value.

How do I go about it.


Solution

  • You could use slice_max() to select rows with 5 highest values of Sales, and then bind the row with Brand B4 behind. If B4 has been in the top 5 rows, distinct() will remove the duplicate one.

    library(dplyr)
    
    dff4 %>%
      slice_max(Sales, n = 5) %>%
      bind_rows(filter(dff4, Brand == "B4")) %>%
      distinct()
    
    #   Region Brand Sales
    # 1      C   B11  3280
    # 2      B    B8  3214
    # 3      A    B1  2923
    # 4      A    B3  2812
    # 5      B    B7  2805
    # 6      A    B4  2286