Search code examples
rfor-loophierarchical-clustering

store values of a function into a vector


I want to create a function which first cuts a tree (from a hclust) into 2:13 groups (12 different cutree values), then calculates the adjusted rand index (randIndex) between these 12 cutree values and a stored vector I already have and finally store these adjusted rand index values into a vector so I can compare the answers. All I've got is

for(i in 2:13){
 a <- cutree(hclust1, k=i)
 randIndex(stored_vector, a)
}

where hclust1 is just the hierarchical clustering output and stored_vector is just the stored vector value I mentioned. I am completely new to programming and would appreciate some help. Thank you.


Solution

  • Does this work for you?

    library(tidyverse)
    library(fossil) # rand.index function
    
    # get a dataset for cutree, change this to your dataset
    hc <- hclust(dist(USArrests))
    
    # change k to your desired vector
    k <- 2:12
    vec <- cutree(hc, k = k) 
    
    # create an empty dataframe 
    df <- tibble(i=as.numeric(),j=as.numeric(),result=as.numeric())
    
    # create nested for loops to get result 
    for (i in k) {
      for (j in k) {
        result <- rand.index(vec[,i-1],vec[,j-1])
        df <- df %>%
          add_row(i=i,j=j,result=result)
      }
    }
    
    # view result
    df %>% 
      filter(result != 1) %>%
      distinct(result, .keep_all = TRUE) %>%
      view()