Search code examples
rhierarchical-clustering

Hierarchical clustering: must have n>=2 objects to cluster in r


I'm following this guide, using readxl to insert my data. I want to use hierarchical clustering to group the studies together and there are 12 observations. Some studies have missing data and some studies have no data at all. So following the guide:

> df <-read_excel("MDO.xlsx")
> df <- na.omit(df)
> df <- scale(df)
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
> d <- dist(df, method = "euclidean")
Warning message:
In dist(df, method = "euclidean") : NAs introduced by coercion
> hc1 <- hclust(d, method = "complete" )
Error in hclust(d, method = "complete") : 
 must have n >= 2 objects to cluster

I'm fairly new to R and have never used clustering before so I'm not sure exactly how to fix these errors


Solution

  • Try this:

    # Read data:
    library(readxl)
    df <- read_excel("MDO.xlsx")
    # Convert to data.frame
    df <- as.data.frame(df)
    # Remove rows when all is na
    df <- df[!apply(is.na(df[, -1]), 1, all),]
    # Scale the columns
    df[, -1] <- apply(df[, -1], 2, scale)
    # Distance and cluster
    d <- dist(df, method = "euclidean")
    hc1 <- hclust(d, method = "complete" )
    plot(hc1)
    

    enter image description here