I am using the fpc
package for determining the optimal number of clusters. The pamk()
function takes a dissimilarity matrix as an argument and does not require the user to specify k
. According to the documentation:
pamk() This calls pam and clara for the partitioning around medoids clustering method (Kaufman and Rouseeuw, 1990) and includes two different ways of estimating the number of clusters.
but when I input two very similar matricies - foo
and bar
(data below), the function errors out on the second matrix (bar)
Error in pam(sdata, k, diss = diss, ...) :
Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2
What could be causing this error, given that the input matricies are basically the same? For example:
foo works!
hc <- hclust(as.dist(foo))
plot(hc)
pamk.best <- fpc::pamk(foo)
pamk.best$nc
[1] 2
bar does not
hc <- hclust(as.dist(bar))
plot(hc, main = 'bar dendogram')
pamk.best <- fpc::pamk(bar)
Error in pam(sdata, k, diss = diss, ...) :
Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2
Any suggestions would be helpful!
dput(foo)
structure(c(0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9,
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9,
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0), .Dim = c(14L, 14L), .Dimnames = list(
c("etc", "etc", "etc", "etc", "etc", "etc", "etc", "similares",
"etc", "etc", "etc", "etc", "etc", "similares"), NULL))
dput(bar)
structure(c(0, 6, 6, 6, 6, 6, 0, 0, 0, 0, 6, 0, 0, 0, 0, 6, 0,
0, 0, 0, 6, 0, 0, 0, 0), .Dim = c(5L, 5L), .Dimnames = list(c("ramírez",
"similares", "similares", "similares", "similares"), NULL))
bar
has n=5
columns, so the max(krange)
has to be <= n-1, thus 4. The default krange is 2:10, hence the error. You may have to pass an appropriate krange
; try:
pamk.best <- fpc::pamk(bar, krange=c(2:(dim(bar)[2]-1)))