I am trying to understand how to parallelize some of my code using R. So, in the following example I want to use k-means to cluster data using 2,3,4,5,6 centers, while using 20 iterations. Here is the code:



parallel.function <- function(i) {
    kmeans( X[1:100,100], centers=?? , nstart=i )

out <- mclapply( c(5, 5, 5, 5), FUN=parallel.function )

How can we parallel simultaneously the iterations and the centers? How to track the outputs, assuming I want to keep all the outputs from k-means across all, iterations and centers, just to learn how?


    mc = mclapply(2:6, function(x,centers)kmeans(x, centers), x=X)

    > summary(mc)
         Length Class  Mode
    [1,] 9      kmeans list
    [2,] 9      kmeans list
    [3,] 9      kmeans list
    [4,] 9      kmeans list
    [5,] 9      kmeans list

    EDIT As requested here is that on two variables nstart and centers

    (pars = expand.grid(i=1:3, cent=2:4))
      i cent
    1 1    2
    2 2    2
    3 3    2
    4 1    3
    5 2    3
    6 3    3
    7 1    4
    8 2    4
    9 3    4
    # zikes horrible
    pars2=apply(pars,1,append, L)
    mc = mclapply(pars2, function(x,pars)kmeans(x, centers=pars$cent,nstart=pars$i ), x=X)
    > summary(mc)
          Length Class  Mode
     [1,] 9      kmeans list
     [2,] 9      kmeans list
     [3,] 9      kmeans list
     [4,] 9      kmeans list
     [5,] 9      kmeans list
     [6,] 9      kmeans list
     [7,] 9      kmeans list
     [8,] 9      kmeans list
     [9,] 9      means list

