I am wondering if there is a package or fast way to generate a statistical summary table for the result of clustering. I imagine I can choose variables of interest and group by cluster number and then calculate mean and max and etc. I am looking for a fast way to do it. Is there any package I can use?
Thanks
The fastest and easiest way might depend on the exact results you want. The easiest approach is probably summary()
in base R, the more versatile is to use the package dplyr
with its functions group_by()
and summarize()
. For specific type of data, other packages may provide a more practical summary.
An example:
DF <- data.frame(groups = sample(LETTERS, 20, replace = TRUE),
var = runif(20))
summary(DF)
library(dplyr)
DF %>%
group_by(groups) %>%
summarize(mean_by_group = mean(var),
number = n())