Search code examples
reda

Error when using 'mclust' package for mixture model estimation


When trying to perform a univariate mixture model estimation, R package mclust produces the following output with an error:

----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm 
----------------------------------------------------

Mclust V (univariate, unequal variance) model with 9 components:

    log.likelihood     n               df               BIC              ICL
 -25576.3596860173 53940 26.0000000000001 -51436.0056895487 -84761.892259584

Clustering table:
    1     2     3     4     5     6     7     8     9 
 3081  8923  8177  3017  8742  6568 10378  3650  1404 
Error in object[as.character(G), modelNames, drop = FALSE] : 
  incorrect number of dimensions

Reproducible example:

library(mclust)

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- log10(diamonds$price)

# determine number of components
mc <- Mclust(myData)
print(summary(mc))

Additional diagnostic information:

> str(mc)
List of 15
 $ call          : language Mclust(data = myData)
 $ data          : num [1:53940, 1] 2.51 2.51 2.51 2.52 2.53 ...
 $ modelName     : chr "V"
 $ n             : int 53940
 $ d             : num 1
 $ G             : int 9
 $ BIC           : num [1:9, 1:2] -64689 -57330 -54802 -52807 -52721 ...
 ..- attr(*, "dimnames")=List of 2
 .. ..$ : chr [1:9] "1" "2" "3" "4" ...
 .. ..$ : chr [1:2] "E" "V"
 ..- attr(*, "G")= num [1:9] 1 2 3 4 5 6 7 8 9
 ..- attr(*, "modelNames")= chr [1:2] "E" "V"
 ..- attr(*, "oneD")= logi TRUE
 $ bic           : num -51436
 $ loglik        : num -25576
 $ df            : num 26
 $ hypvol        : num NA
 $ parameters    :List of 4
 ..$ Vinv    : NULL
 ..$ pro     : num [1:9] 0.0606 0.1558 0.1628 0.0411 0.1655 ...
 ..$ mean    : Named num [1:9] 2.7 2.86 3.03 3.24 3.39 ...
 .. ..- attr(*, "names")= chr [1:9] "1" "2" "3" "4" ...
 ..$ variance:List of 5
 .. ..$ modelName: chr "V"
 .. ..$ d        : num 1
 .. ..$ G        : int 9
 .. ..$ sigmasq  : num [1:9] 0.004915 0.006932 0.009595 0.000833 0.008896 ...
 .. ..$ scale    : num [1:9] 0.004915 0.006932 0.009595 0.000833 0.008896 ...
 $ classification: num [1:53940] 1 1 1 1 1 1 1 1 1 1 ...
 $ uncertainty   : num [1:53940] 0.0132 0.0132 0.0134 0.015 0.0152 ...
 $ z             : num [1:53940, 1:9] 0.987 0.987 0.987 0.985 0.985 ...
 ..- attr(*, "dimnames")=List of 2
 .. ..$ : NULL
 .. ..$ : NULL
 - attr(*, "class")= chr "Mclust"

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rebmix_2.6.1       ddst_1.03          evd_2.3-0          orthopolynom_1.0-5
 [5] polynom_1.3-8      mixtools_1.0.2     segmented_0.4-0.0  boot_1.3-11       
 [9] mclust_4.3         MASS_7.3-34       

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.1      colorspace_1.2-4 digest_0.6.4     ggplot2_1.0.0   
 [5] grid_3.1.1       gtable_0.1.2     munsell_0.4.2    plyr_1.8.1      
 [9] proto_0.3-10     reshape2_1.4     scales_0.2.4     stringr_0.6.2   
[13] tools_3.1.1     

Solution

  • I have figured out myself what was the problem. The error message has been produced by another mclust function mclustModel(), which my code has been executing after the Mclust() call. The error was due to the fact that mclustModel() function requires a different type of object passed as a second argument. The correct call sequence, when using mclustModel() (instead of Mclust()), is shown below.

    mc <- mclustBIC(myData)
    print(summary(mc, myData, parameters = TRUE))
    plot(mc)
    
    bestModel <- mclustModel(myData, mc)
    print(summary(bestModel, myData))