Hi I have the below dataset and code.
df <-
read.table(textConnection("Area CustomerAge CustomersInGroup
A 28 1
A 40 3
A 70 2
A 19 4
B 13 3
B 12 2
B 72 1
B 90 7"), header=TRUE)
df2 <- df %>%
mutate(
# Create categories
Customer_Age_Group = dplyr::case_when(
CustomerAge <= 18 ~ "0-18",
CustomerAge > 18 & CustomerAge <= 60 ~ "19-60",
CustomerAge > 60 ~ ">60"
))
df2 %>% group_by(Area, Customer_Age_Group,.drop = FALSE) %>%
count() %>%
ungroup() %>%
complete(Area, Customer_Age_Group, fill=list(n=0)) %>%
arrange(Area, parse_number(Customer_Age_Group))
Problem is that it provides a final count of applicable rows per group - I would like to amend this so instead the final count is of the values of 'CustomersInGroup' - e.g. for group A 19-60 this would sum to 8 (the values of the 3 rows) rather than the number of 3 representing the number of rows. Any help very much appreciated.
The final output should look like the below - totals of CustomersInGroup grouped by Area and then CustomerAge
Area | CustomerAge | Totals |
---|---|---|
A | 0-18 | 0 |
A | 19-60 | 8 |
A | >60 | 2 |
B | 0-18 | 5 |
B | 19-60 | 0 |
B | >60 | 8 |
You could use summarise
with sum
to get the totals of CustomersInGroup. You can use the following code:
library(dplyr)
library(tidyr)
library(readr)
df2 <- df %>%
mutate(
# Create categories
Customer_Age_Group = dplyr::case_when(
CustomerAge <= 18 ~ "0-18",
CustomerAge > 18 & CustomerAge <= 60 ~ "19-60",
CustomerAge > 60 ~ ">60"
))
df2 %>%
group_by(Area, Customer_Age_Group, .drop = FALSE) %>%
summarise(Totals = sum(CustomersInGroup)) %>%
ungroup() %>%
complete(Area, Customer_Age_Group, fill=list(Totals=0)) %>%
arrange(Area, parse_number(Customer_Age_Group))
#> `summarise()` has grouped output by 'Area'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> Area Customer_Age_Group Totals
#> <chr> <chr> <int>
#> 1 A 0-18 0
#> 2 A 19-60 8
#> 3 A >60 2
#> 4 B 0-18 5
#> 5 B 19-60 0
#> 6 B >60 8
Created on 2022-10-09 with reprex v2.0.2