I would like to asses the clusterability of my dataset of over 80.000 variables with Hopkins stastics. I started with n= 80.000. As n is obviously too high I reduce it to 10. But still received the same error message:
Error: cannot allocate vector of size 2511.3 Gb
clustab <- get_clust_tendency(WKA_ohneJB, 10, graph = TRUE, gradient = list(low = "red", + mid = "white", high = "blue"))
Apart from solving the issue, I have a further question: What is the highest number (n) you can take?
WKA_ohneJB <- read.csv("WKA_ohneJB.csv", header=TRUE, sep = ";", stringsAsFactors = FALSE)
str(WKA_ohneJB)
mean(WKA_ohneJB[,"BASKETS_NZ"]) # 1.023035
median(WKA_ohneJB[,"BASKETS_NZ"]) # 1
var(WKA_ohneJB[,"BASKETS_NZ"]) # 0.06871633
sd(WKA_ohneJB[,"BASKETS_NZ"]) # 0.262138
range (WKA_ohneJB[,"BASKETS_NZ"]) # 0 49
hist (WKA_ohneJB[,"BASKETS_NZ"])
summary(WKA_ohneJB)
clustab <- get_clust_tendency(WKA_ohneJB, 10, graph = TRUE, gradient = list(low = "red", mid = "white", high = "blue"))
WKA_ohneJB$hopkins_stat
The error you get indicates a failure to obtain memory (more on memory limits in R).
To increase the amount of memory allocated to R you can use memory.limit
memory.limit(size = NA)
As mentioned in the documentation, size is
numeric. If NA report the memory limit, otherwise request a new limit, in Mb. Only values of up to 4095 are allowed on 32-bit R builds, but see ‘Details’.
Reading Details
If 32-bit R is run on most 64-bit versions of Windows the maximum value of obtainable memory is just under 4Gb. For a 64-bit versions of R under 64-bit Windows the limit is currently 8Tb.
Or simply use
memory.size(max = TRUE)
(If TRUE the maximum amount of memory obtained from the OS is reported)
So, if you have a 64-bit Windows 10, increasing the memory allocated should fix your problem.