Search code examples
rcluster-analysis

Hopkins function in comato R returns error message


I want to test the Hopkins statistic outputed from the hopkins() function in comato package and I use the following reproducible code:

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
#
# SIMPLE EXPERIMENTS TO CHECK THE EFFECT OF CLUSTERABILITY ON HOPKINS STATISTIC
#
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

###################################################################################
# CREATE THREE DATASETS OF LOW, MEDIUM AND HIGH CLUSTERABILITY
#################################################################################

low1 <- data.table(V1 = rnorm(500, mean = 0, sd = 1), V2 = rnorm(500, mean = 0, sd = 1), Cluster = as.factor(rep(1, 500)))
low2 <- data.table(V1 = rnorm(500, mean = 0, sd = 1), V2 = rnorm(500, mean = 0, sd = 1), Cluster = as.factor(rep(2, 500)))
low <- rbind(low1, low2)


#---------------------------------------------------------------------------------------------
medium1 <- data.table(V1 = rnorm(500, mean = 0, sd = 1), V2 = rnorm(500, mean = 0, sd = 1), Cluster = as.factor(rep(1, 500)))
medium2 <- data.table(V1 = rnorm(500, mean = 2, sd = 1), V2 = rnorm(500, mean = 2, sd = 1), Cluster = as.factor(rep(2, 500)))
medium <- rbind(medium1, medium2)

#----------------------------------------------------------------------------------------------

high1 <- data.table(V1 = rnorm(500, mean = 0, sd = 1), V2 = rnorm(500, mean = 0, sd = 1), Cluster = as.factor(rep(1, 500)))
high2 <- data.table(V1 = rnorm(500, mean = 4, sd = 1), V2 = rnorm(500, mean = 4, sd = 1), Cluster = as.factor(rep(2, 500)))
high <- rbind(high1, high2)

#########################################################################################
# VISUALIZE THE CLUSTERS
##########################################################################################


#---------------------------------------------------------------
# LOW
#--------------------------------------------------------------

ggplot(low, aes(V1, V2, colour = Cluster )) +
  geom_point(size = 2.5, alpha = 0.5) + ggtitle("Low Clusterability") + theme_economist()

enter image description here

#---------------------------------------------------------------
# MEDIUM
#--------------------------------------------------------------

ggplot(medium, aes(V1, V2, colour = Cluster )) +
  geom_point(size = 2.5, alpha = 0.5) + ggtitle("Medium Clusterability") + theme_economist()

enter image description here

#---------------------------------------------------------------
# HIGH
#--------------------------------------------------------------

ggplot(high, aes(V1, V2, colour = Cluster )) +
  geom_point(size = 2.5, alpha = 0.5) + ggtitle("High Clusterability") + theme_economist()

enter image description here

##########################################################################################
# DETERMINE THE HOPKINS STATISTIC FOR EACH OF THE AFOREMENTIONED CASES
############################################################################################
    
library(comato)

hopkins_low_comato <- Hopkins.index(low[, .(V1, V2)])

hopkins_medium_comato <- Hopkins.index(medium[, .(V1, V2)])

hopkins_high_comato <- Hopkins.index(high[, .(V1, V2)])

However I get the following error messages :

enter image description here


Solution

  • Basically the function accepts only one matrix as a data type. So you can transform your data frame in a matrix and use the function in this way:

    low_matrix <- as.matrix(low[, .(V1, V2)])
    hopkins_low_comato <- Hopkins.index(low_matrix)