Search code examples
c++rmacosrcppmlpack

How to use mlpack in my Rcpp code in macOS


I am trying to build an R package using mlpack. As suggested in this link I am using the following cpp function

#include <Rcpp/Rcpp>
#include <mlpack.h>

// Two include directories adjusted for my use of mlpack 3.4.2 on Ubuntu
#include <mlpack/core.hpp>
#include <mlpack/methods/kmeans/kmeans.hpp>
#include <mlpack/methods/kmeans/random_partition.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(mlpack)]]

// This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
// and src/mlpack/tests/kmeans_test.cpp. We borrow the data set, and the
// code from the first test function. Passing data from R in easy thanks
// to RcppArmadillo, 'and left as an exercise'.

// Generate dataset; written transposed because it's easier to read.
arma::mat kMeansData("  0.0   0.0;" // Class 1.
                     "  0.3   0.4;"
                     "  0.1   0.0;"
                     "  0.1   0.3;"
                     " -0.2  -0.2;"
                     " -0.1   0.3;"
                     " -0.4   0.1;"
                     "  0.2  -0.1;"
                     "  0.3   0.0;"
                     " -0.3  -0.3;"
                     "  0.1  -0.1;"
                     "  0.2  -0.3;"
                     " -0.3   0.2;"
                     " 10.0  10.0;" // Class 2.
                     " 10.1   9.9;"
                     "  9.9  10.0;"
                     " 10.2   9.7;"
                     " 10.2   9.8;"
                     "  9.7  10.3;"
                     "  9.9  10.1;"
                     "-10.0   5.0;" // Class 3.
                     " -9.8   5.1;"
                     " -9.9   4.9;"
                     "-10.0   4.9;"
                     "-10.2   5.2;"
                     "-10.1   5.1;"
                     "-10.3   5.3;"
                     "-10.0   4.8;"
                     " -9.6   5.0;"
                     " -9.8   5.1;");


// [[Rcpp::export]]
arma::Row<size_t> kmeansDemo() {

    mlpack::kmeans::KMeans<mlpack::metric::EuclideanDistance, 
                           mlpack::kmeans::RandomPartition> kmeans;

    arma::Row<size_t> assignments;
    kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);

    return assignments;
}

If I sourceCpp the above in Ubuntu linux Sys.setenv("PKG_LIBS"="-lmlpack") then it compiles successfully. However, I am unable to use it on macOS with Apple M2 architecture. I am getting the following error in macOS

/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/mlpack/include/mlpack.h:52:10: fatal error: mlpack/core.hpp: No such file or directory
   52 | #include <mlpack/core.hpp>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated. 

I have installed mlpack R package installed as well as the system mlpack using brew. Seems to me that R cannot link to the mlpack libraries that are located in /opt/homebrew/include/ in my system. Is there a way to link to these? I have tried brew link mlpack which shows linking is successful but still got the same compilation error. Additionally I tried the following in R before sourceCpping but no luck!

Sys.setenv("LDFLAGS"="-L/opt/homebrew/lib")
Sys.setenv("CPPFLAGS"="-I/opt/homebrew/include")
Sys.setenv("PKG_LIBS"="-lmlpack")

Please let me know if there is any way out for this in macOS.

P.S. Both R and Rstudio are installed in my system using brew.


Solution

  • mlpack 4.2.0 is now on CRAN and ships exported headers we can use! A minimally modified version of your example follows.

    Code

    #include <Rcpp/Rcpp>
    #include <mlpack.h>
    
    #include <mlpack/methods/kmeans.hpp>
    
    // -- use C++17
    // [[Rcpp::plugins(cpp17)]]
    // -- use Armadillo, Ensmallen and mlpack headers
    // [[Rcpp::depends(RcppArmadillo)]]
    // [[Rcpp::depends(RcppEnsmallen)]]
    // [[Rcpp::depends(mlpack)]]
    
    // This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
    
    // Generate dataset; written transposed because it's easier to read.
    arma::mat kMeansData("  0.0   0.0;" // Class 1.
                         "  0.3   0.4;"
                         "  0.1   0.0;"
                         "  0.1   0.3;"
                         " -0.2  -0.2;"
                         " -0.1   0.3;"
                         " -0.4   0.1;"
                         "  0.2  -0.1;"
                         "  0.3   0.0;"
                         " -0.3  -0.3;"
                         "  0.1  -0.1;"
                         "  0.2  -0.3;"
                         " -0.3   0.2;"
                         " 10.0  10.0;" // Class 2.
                         " 10.1   9.9;"
                         "  9.9  10.0;"
                         " 10.2   9.7;"
                         " 10.2   9.8;"
                         "  9.7  10.3;"
                         "  9.9  10.1;"
                         "-10.0   5.0;" // Class 3.
                         " -9.8   5.1;"
                         " -9.9   4.9;"
                         "-10.0   4.9;"
                         "-10.2   5.2;"
                         "-10.1   5.1;"
                         "-10.3   5.3;"
                         "-10.0   4.8;"
                         " -9.6   5.0;"
                         " -9.8   5.1;");
    
    
    // [[Rcpp::export]]
    arma::Row<size_t> kmeansDemo() {
    
        // Originally written to use RandomPartition, and is left that
        // way because RandomPartition gives better initializations here.
        mlpack::KMeans<mlpack::EuclideanDistance, mlpack::RandomPartition> kmeans;
    
        // mlpack::KMeans<> kmeans;    // default arguments as an alternative
    
        arma::Row<size_t> assignments;
        kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);
    
        return assignments;
    }
    
    /*** R
    kmeansDemo()
    */
    

    Output

    > Rcpp::sourceCpp("~/git/stackoverflow/76336745/answer.cpp")
    
    > kmeansDemo()
    [INFO ] KMeans::Cluster(): iteration 1, residual 13.7285.
    [INFO ] KMeans::Cluster(): iteration 2, residual 2.51215e-15.
    [INFO ] KMeans::Cluster(): converged after 2 iterations.
    [INFO ] 186 distance calculations.
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
    [1,]    2    2    2    2    2    2    2    2    2     2     2     2     2     0     0     0     0
         [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
    [1,]     0     0     0     1     1     1     1     1     1     1     1     1     1
    > 
    

    Packages

    > sapply(c("RcppArmadillo", "RcppEnsmallen", "mlpack"), \(x) format(packageVersion(x)))
    RcppArmadillo RcppEnsmallen        mlpack 
     "0.12.4.1.0"  "0.2.19.0.1"       "4.2.0" 
    >