Search code examples
rcluster-analysis

Assessing clusterability of a dataset with Hopkins Statistics


I would like to asses the clusterability of my dataset of over 80.000 variables with Hopkins stastics. I started with n= 80.000. As n is obviously too high I reduce it to 10. But still received the same error message:

Error: cannot allocate vector of size 2511.3 Gb

clustab <- get_clust_tendency(WKA_ohneJB, 10, graph = TRUE, gradient = list(low = "red", + mid = "white", high = "blue"))

Apart from solving the issue, I have a further question: What is the highest number (n) you can take?

3. Import csv file

WKA_ohneJB <- read.csv("WKA_ohneJB.csv", header=TRUE, sep = ";", stringsAsFactors = FALSE)

4. Verify structure

str(WKA_ohneJB)

5. Descriptive Statistics for column BASKETS_NZ

mean(WKA_ohneJB[,"BASKETS_NZ"]) # 1.023035

median(WKA_ohneJB[,"BASKETS_NZ"]) # 1

var(WKA_ohneJB[,"BASKETS_NZ"]) # 0.06871633

sd(WKA_ohneJB[,"BASKETS_NZ"]) # 0.262138

range (WKA_ohneJB[,"BASKETS_NZ"]) # 0 49

hist (WKA_ohneJB[,"BASKETS_NZ"])

6. Summary descriptive statistics

summary(WKA_ohneJB)

7. Assessing clusterability of data set

clustab <- get_clust_tendency(WKA_ohneJB, 10, graph = TRUE, gradient = list(low = "red", mid = "white", high = "blue"))

WKA_ohneJB$hopkins_stat

pic csv file


Solution

  • The error you get indicates a failure to obtain memory (more on memory limits in R).

    To increase the amount of memory allocated to R you can use memory.limit

    memory.limit(size = NA)
    

    As mentioned in the documentation, size is

    numeric. If NA report the memory limit, otherwise request a new limit, in Mb. Only values of up to 4095 are allowed on 32-bit R builds, but see ‘Details’.

    Reading Details

    If 32-bit R is run on most 64-bit versions of Windows the maximum value of obtainable memory is just under 4Gb. For a 64-bit versions of R under 64-bit Windows the limit is currently 8Tb.

    Or simply use

    memory.size(max = TRUE)
    

    (If TRUE the maximum amount of memory obtained from the OS is reported)

    So, if you have a 64-bit Windows 10, increasing the memory allocated should fix your problem.