Search code examples
rgroup-bycluster-analysis

How to create statistical summary for the result of clustering for different group of variable in R


I am wondering if there is a package or fast way to generate a statistical summary table for the result of clustering. I imagine I can choose variables of interest and group by cluster number and then calculate mean and max and etc. I am looking for a fast way to do it. Is there any package I can use?

Thanks


Solution

  • The fastest and easiest way might depend on the exact results you want. The easiest approach is probably summary() in base R, the more versatile is to use the package dplyr with its functions group_by() and summarize(). For specific type of data, other packages may provide a more practical summary.

    An example:

    DF <- data.frame(groups = sample(LETTERS, 20, replace = TRUE),
                     var = runif(20))
    
    summary(DF)
    
    library(dplyr)
    DF %>%
      group_by(groups) %>%
      summarize(mean_by_group = mean(var),
                number = n())