Search code examples
lightgbm

Build issue with LightGBM 2.2.4, Boost 1.64.0 on Power9 w/GPU


I am attempting to build LightGBM version 2.2.4 (git hash 5256cda69300d6b83b18180da2992a1e50a6b392) on an IBM Power9 system ("Witherspoon", CPU is a Power System AC922, 8335-GTH) running Red Hat Enterprise Server 7.5 (Maipo).

I am using the RHEL-packaged C compiler, gcc 4.8.5, a local version of cmake, version 3.13.1, and a local installation of Boost version 1.64.0, The system has CUDA 9.2 installed, and I have located the libOpenCL directories and include files.

My configuration operation is (from inside a newly-created build directory in the root of the unpacked LightGBM tree):

# export BOOST_ROOT=/share/sw/boost/1_64_0/ 
# cmake3 -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/lib64/nvidia/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/include/CL/ .. 
# make

The configuration step apparently succeeds, generating a runnable makefile.

The build fails at around 41% with errors from deep in the bowels of Boost:



    [ 41%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
    In file included from /share/sw/boost/1_64_0/include/boost/mpl/aux_/integral_wrapper.hpp:22:0,
                     from /share/sw/boost/1_64_0/include/boost/mpl/int.hpp:20,
                     from /share/sw/boost/1_64_0/include/boost/mpl/lambda_fwd.hpp:23,
                     from /share/sw/boost/1_64_0/include/boost/mpl/aux_/na_spec.hpp:18,
                     from /share/sw/boost/1_64_0/include/boost/mpl/identity.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/iterator/detail/enable_if.hpp:11,
                     from /share/sw/boost/1_64_0/include/boost/iterator/transform_iterator.hpp:11,
                     from /share/sw/boost/1_64_0/include/boost/algorithm/string/iter_find.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/algorithm/string/split.hpp:16,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/device.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/context.hpp:19,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/buffer.hpp:15,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/core.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/gpu_tree_learner.h:27,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/parallel_tree_learner.h:5,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:1:
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:28:18: error: pasting ")" and "20" does not give a valid preprocessing token
         BOOST_PP_CAT(vector, BOOST_MPL_LIMIT_VECTOR_SIZE).hpp \
                      ^
    /share/sw/boost/1_64_0/include/boost/preprocessor/cat.hpp:29:34: note: in definition of macro ‘BOOST_PP_CAT_I’
     #    define BOOST_PP_CAT_I(a, b) a ## b
                                      ^
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:28:5: note: in expansion of macro ‘BOOST_PP_CAT’
         BOOST_PP_CAT(vector, BOOST_MPL_LIMIT_VECTOR_SIZE).hpp \
         ^
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:36:49: note: in expansion of macro ‘AUX778076_VECTOR_HEADER’
     #   include BOOST_PP_STRINGIZE(boost/mpl/vector/AUX778076_VECTOR_HEADER)
                                                     ^
    In file included from /share/sw/boost/1_64_0/include/boost/math/policies/policy.hpp:14:0,
                     from /share/sw/boost/1_64_0/include/boost/math/special_functions/math_fwd.hpp:28,
                     from /share/sw/boost/1_64_0/include/boost/math/special_functions/sign.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/inf_nan.hpp:34,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:63,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/converter_lexical.hpp:54,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/try_lexical_convert.hpp:42,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast.hpp:32,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/detail/meta_kernel.hpp:23,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/iterator/buffer_iterator.hpp:26,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/algorithm/detail/copy_on_device.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/algorithm/copy.hpp:26,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/container/vector.hpp:32,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/gpu_tree_learner.h:28,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/parallel_tree_learner.h:5,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:1:
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:36:73: fatal error: boost/mpl/__attribute__((altivec(vector__)))/__attribute__((altivec(vector__)))20.hpp: No such file or directory
     #   include BOOST_PP_STRINGIZE(boost/mpl/vector/AUX778076_VECTOR_HEADER)

From the messages, it looks like some preprocessor string manipulation has gone wrong, it's maybe trying to find the "vector20.hpp" file in the boot/mpl/vector include directory, but the BOOST_PP_CAT operation has gone wrong, so it's failing to construct a proper filename? Also, the "altivec" is implicated, the Power9 CPU is altivec-capable, maybe an additional header or compiler switch is required?

I can successfully build (with warnings) on a Debian 9 "stretch" system with x86_64 architecture and CUDA 9.1 (for the libOpenCL stuff), with the Debian-packaged Boost version 1.62.

I also tried building the Power9 version against Boost 1.69, and against Boost 1.62 (the one that worked on Debian), and got the same errors in the same place.

Help?


Solution

  • This is addressed in an issue on the LightGBM github, which I somehow missed on my initial search.

    This build attempt is misguided.

    The compilation problem is apparently an altivec/boost interaction, and there's no OpenCL GPU support on the Power architecture, and LightGBM is OpenCL under the hood, so the effort is doomed in any case.