I have a dataframe like so
ID <- c('John', 'Bill', 'Alice','Paulina')
Type1 <- c(1,1,0,1)
Type2 <- c(0,1,1,0)
cluster <- c(1,2,3,1)
test <- data.frame(ID, Type1, Type2, cluster)
I want to group by cluster and sum the values in all the other columns apart from ID that should be dropped.
I achieved it through
test.sum <- test %>%
group_by(cluster)%>%
summarise(sum(Type1), sum(Type2))
However, I have thousands of types and I can't write out each column in summarise manually. Can you help me?
This is whereacross()
and contains
comes in incredibly useful to select the columns you want to summarise across:
test %>%
group_by(cluster) %>%
summarise(across(contains("Type"), sum))
cluster Type1 Type2
<dbl> <dbl> <dbl>
1 1 2 0
2 2 1 1
3 3 0 1
Alternatively, pivoting the dataset into long and then back into wide means you can easily analyse all groups and clusters at once:
library(dplyr)
library(tidyr)
test %>%
pivot_longer(-c(ID, cluster)) %>%
group_by(cluster, name) %>%
summarise(sum_value = sum(value)) %>%
pivot_wider(names_from = "name", values_from = "sum_value")
cluster Type1 Type2
<dbl> <dbl> <dbl>
1 1 2 0
2 2 1 1
3 3 0 1