Search code examples
routputcluster-computing

Is there a great way to grab the results from several cluster outputs in to one in the form of a dataframe? Any Suggestions?


I am completing a cluster analysis on a data set and slice and dicing it in to various parts using a variety of methods, all with the goal to maximize the outcome of using kmeans to segment a large set of data. So there are 15 separate kmeans objects as results that it would be very helpful to be able to turn these in to a table to view all at once, including the title to reference the model, and key statistics about each model. Any advice? Any example of the results when you just print the results to the output section of R Studio is below. I've searched for packages etc and have not had any luck. Thanks for any help you can provide!!!

 K-means clustering with 5 clusters of sizes 11356, 4621, 3380, 7455, 4381

Cluster means:
  PRCT_DIFF_LOCK_ON PRCT_FRONT_PTO_ON PRCT_REAR_PTO_ON PRCT_MFWD_ON
1       0.045629787      0.0006149385       0.05848930   0.80521712
2       0.006848544      0.0036244639       0.15807745   0.06906081
3       0.390860459      0.0004615964       0.07576421   0.79353567
4       0.040412934      0.0048262841       0.11052730   0.48966547
5       0.053424999      0.0149324570       0.45581038   0.64261907
Within cluster sum of squares by cluster:
    [1] 665.8571 568.6334 264.2810 554.8512 457.3876
     (between_SS / total_SS =  55.5 %)

Solution

  • The output of kmeans is a list. If we want to extract the Cluster means, use

    k2$centers
        Murder    Assault   UrbanPop       Rape
    1  1.004934  1.0138274  0.1975853  0.8469650
    2 -0.669956 -0.6758849 -0.1317235 -0.5646433
    
    

    broom package can summarise the output in a data.frame/tibble

    library(broom)
    > tidy(k2)
    # A tibble: 2 x 7
      Murder Assault UrbanPop   Rape  size withinss cluster
       <dbl>   <dbl>    <dbl>  <dbl> <int>    <dbl> <fct>  
    1  1.00    1.01     0.198  0.847    20     46.7 1      
    2 -0.670  -0.676   -0.132 -0.565    30     56.1 2      
    > glance(k2)
    # A tibble: 1 x 4
      totss tot.withinss betweenss  iter
      <dbl>        <dbl>     <dbl> <int>
    1   196         103.      93.1     1
    

    -reproducible example

    library(cluster)  
    df <- USArrests
    df <- na.omit(df)
    df <- scale(df)
    k2 <- kmeans(df, centers = 2, nstart = 25)