I've got a mixed data set (categorical and continuous variables) and I'd like to do hierarchical clustering using Gower distance.
I base my code on an example from https://www.r-bloggers.com/hierarchical-clustering-in-r-2/, which uses base R dist()
for Euclidean distance. Since dist()
doesn't compute Gower distance, I've tried using philentropy::distance()
to compute it but it doesn't work.
Thanks for any help!
# Data
data("mtcars")
mtcars$cyl <- as.factor(mtcars$cyl)
# Hierarchical clustering with Euclidean distance - works
clusters <- hclust(dist(mtcars[, 1:2]))
plot(clusters)
# Hierarchical clustering with Gower distance - doesn't work
library(philentropy)
clusters <- hclust(distance(mtcars[, 1:2], method = "gower"))
plot(clusters)
The error is in the distance
function itself.
I don't know if it's intentional or not, but the current implementation of philentropy::distance
with the "gower" method cannot handle any mixed data types, since the first operation is to transpose the data.frame, producing a character matrix which then throws the typing error when passed to the DistMatrixWithoutUnit
function.
You might try using the daisy
function from cluster
instead.
library(cluster)
x <- mtcars[,1:2]
x$cyl <- as.factor(x$cyl)
dist <- daisy(x, metric = "gower")
cls <- hclust(dist)
plot(cls)
EDIT: For future reference it seems like philentropy
will be updated to included better type handling in the next version. From the vignette
In future versions of philentropy I will optimize the distance() function so that internal checks for data type correctness and correct input data will take less termination time than the base dist() function.