Search code examples
rdistancehclust

define data.frame as a distance and perform hierarchical clustering in R


ggg <- data.frame(row.names=c("a","b","c","d","e"),var1=c("0","0","0","0","0"),var2=c("1","1","1","1","2"))

ggg_dist <- as.matrix(ggg) %>% as.dist(.)

In as.dist.default(.) : non-square matrix

class(ggg_dist)
[1] "dist"

ggg_dist
Warning message:
In df[row(df) > col(df)] <- x :
  number of items to replace is not a multiple of replacement length

 h_ggg <- hclust(ggg_dist,method="average")

Fehler in hclust(ggg_dist, method = "average") : 
  'D' must have length (N \choose 2).

I want to perform hierarchical clustering with ggg. ggg_dist is a distance as confirmed with class() made out of ggg . I want to do hierarchical clustering with ggg_dist but this does not work. It shows above error. How can I solve that.

I tried that How to convert data.frame into distance matrix for hierarchical clustering? , but get the same error when I try to call ggg_dist.


Solution

  • You can use the function dist:

    ggg_dist <- dist(ggg, method = "euclidian")
    

    Result:

    ggg_dist
      a b c d
    b 0      
    c 0 0    
    d 0 0 0  
    e 1 1 1 1