I am having a bit of trouble running the examples in Spark's MLLib on a machine running Fedora 23. I have built Spark 1.6.2 with the following options per Spark documentation:
build/mvn -Pnetlib-lgpl -Pyarn -Phadoop-2.4 \
-Dhadoop.version=2.4.0 -DskipTests clean package
and upon running the binary classification example:
bin/spark-submit --class org.apache.spark.examples.mllib.BinaryClassification \
examples/target/scala-*/spark-examples-*.jar \
--algorithm LR --regType L2 --regParam 1.0 \
data/mllib/sample_binary_classification_data.txt
I receive the following error:
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.92-1.b14.fc23.x86_64/jre/bin/java: symbol lookup error: /tmp/jniloader5830472710956533873netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dscal
Errors of this form (symbol lookup error with netlib) are not limited to this particular example. On the other hand, the Elastic Net example (./bin/run-example ml.LinearRegressionWithElasticNetExample
) runs without a problem.
I have tried a number of solutions to no avail. For example, I went through some of the advice here https://datasciencemadesimpler.wordpress.com/tag/blas/, and while I can successfully import from com.github.fommil.netlib.BLAS
and LAPACK
, the aforementioned symbol lookup error persists.
I have read through the netlib-java documentation at fommil/netlib-java, and have ensured my system has the libblas
and liblapack
shared object files:
$ ls /usr/lib64 | grep libblas
libblas.so
libblas.so.3
libblas.so.3.5
libblas.so.3.5.0
$ ls /usr/lib64 | grep liblapack
liblapacke.so
liblapacke.so.3
liblapacke.so.3.5
liblapacke.so.3.5.0
liblapack.so
liblapack.so.3
liblapack.so.3.5
liblapack.so.3.5.0
The most promising advice I found was here http://fossdev.blogspot.com/2015/12/scala-breeze-blas-lapack-on-linux.html, which suggests including
JAVA_OPTS="- Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.NativeRefBLAS"
in the sbt
script. So, I included appended those options to _COMPILE_JVM_OPTS="..."
in the build/mvn
script, which also did not resolve the problem.
Finally, a last bit of advice I found online suggested passing the following flags to sbt
:
sbt -Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS \
-Dcom.github.fommil.netlib.LAPACK=com.github.fommil.netlib.F2jLAPACK \
-Dcom.github.fommil.netlib.ARPACK=com.github.fommil.netlib.F2jARPACK
and again the issue persists. I am limited to two links in my post, but the advice can be found as the README.md of lildata's 'scaladatascience' repo on github.
Has anybody suffered this issue and successfully resolved it? Any and all help or advice is deeply appreciated.
It's been a couple months, but I got back to this problem and was able to get a functioning workaround (posting here in case anybody else has the same issue).
It came down to library precedence; so, by calling:
$ export LD_PRELOAD=/path/to/libopenblas.so
prior to launching Spark, everything works as expected.
I figured out the solution after reading: