My data frame looks like this:
View(df)
Product Value
a 2
b 4
c 3
d 10
e 15
f 5
g 6
h 4
i 50
j 20
k 35
l 25
m 4
n 6
o 30
p 4
q 40
r 5
s 3
t 40
I want to find the 9 most expensive products and summaries the rest. It should look like this:
Product Value
d 10
e 15
i 50
j 20
k 35
l 25
o 30
q 40
t 40
rest 46
Rest is the sum of the other 11 products.
I tried it with summaries
, but it didn't work:
new <- df %>%
group_by(Product)%>%
summarise((Value > 10) = sum(Value)) %>%
ungroup()
We can use dplyr::row_number
to effectively rank the observations after using arrange
to order the data by Value
. Then, we augment the Product
column so that any values that aren't in the top 9 are coded as Rest
. Finally, we group by the updated Product
and take the sum using summarise
dat %>%
arrange(desc(Value)) %>%
mutate(RowNum = row_number(),
Product = ifelse(RowNum <= 9, Product, 'Rest')) %>%
group_by(Product) %>%
summarise(Value = sum(Value))
# A tibble: 10 × 2
Product Value
<chr> <int>
1 d 10
2 e 15
3 i 50
4 j 20
5 k 35
6 l 25
7 o 30
8 q 40
9 Rest 46
10 t 40
dat <- structure(list(Product = c("a", "b", "c", "d", "e", "f", "g",
"h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t"
), Value = c(2L, 4L, 3L, 10L, 15L, 5L, 6L, 4L, 50L, 20L, 35L,
25L, 4L, 6L, 30L, 4L, 40L, 5L, 3L, 40L)), .Names = c("Product",
"Value"), class = "data.frame", row.names = c(NA, -20L))