I have below data:
structure(list(Filters = structure(c(4L, 2L, 6L, 5L, 1L, 3L,
7L), .Label = c("product type : shade || shade family : white",
"space : bedroom || product type : appliances", "space : bedroom || shade family : white",
"space : living room || product type : shade || shade family : white",
"space : living room || product type : shade|appliances", "space : living room || shade family : white|black",
"space : living room || shade family : yellows & greens"), class = "factor"),
Count = c(143L, 131L, 119L, 101L, 94L, 84L, 82L)), class = "data.frame", row.names = c(NA,
-7L))
I want to get sum of count for each category and subcategory available in data. also if category has multiple subcategories then each category should have sum of count for each.
For e.g. to get sum of (product type : appliances) => product type : appliances & product type : shade|appliances both are eligible.
I am looking for output like:
Category Count
1 space 660
2 product type 469
3 shade family 522
4 space : living room 445
5 product type : shade 338
6 product type : appliances 232
7 So on …. ..
Thanks in advance.
One approach is with separate_rows
and separate
:
library(dplyr)
library(tidry)
df %>%
separate_rows(Filters,sep = " \\|\\| ") %>%
separate(Filters, into = c("Category","Subcategory")," : ") %>%
group_by(Category, Subcategory) %>%
summarise(Count = sum(Count)) %>%
bind_rows(group_by(.,Category) %>%
summarize(Count = sum(Count)) %>%
mutate(Subcategory = "Total")) %>%
separate_rows(Subcategory,sep = "\\|") %>%
arrange(-Count)
# A tibble: 13 x 3
# Groups: Category [3]
Category Subcategory Count
<chr> <chr> <int>
1 space Total 660
2 shade family Total 522
3 product type Total 469
4 space living room 445
5 shade family white 321
6 product type shade 237
7 space bedroom 215
8 product type appliances 131
9 shade family white 119
10 shade family black 119
11 product type shade 101
12 product type appliances 101
13 shade family yellows & greens 82
Thanks to this answer from @Bishops_Guest for some inspiration.