Memory allocation error on worker 0: std::bad_alloc: CUDA error

DESCRIPTION

I am just trying to gave a trainign and a test set for the model but I get the following errors
1st data package - train_data = xgboost.DMatrix(data=X_train, label=y_train) Up until I run just this and do training and anything with, only this does not gives an error message
2nd data package - test_data = xgboost.DMatrix(data=X_test, label=y_test) couple cells down the line, they are not executed together

ENVIRONMENT

followed guide - https://github.com/rapidsai-community/notebooks-contrib/blob/branch-0.14/intermediate_notebooks/E2E/synthetic_3D/rapids_ml_workflow_demo.ipynb
conda create -n rapids-0.16 -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.16 python=3.7 cudatoolkit=10.2
AWS EC2: Deep Learning AMI (Ubuntu 18.04) Version 36.0 - ami-063585f0e06d22308: MXNet-1.7.0, TensorFlow-2.3.1, 2.1.0 & 1.15.3, PyTorch-1.4.0 & 1.7.0, Neuron, & others. NVIDIA CUDA, cuDNN, NCCL, Intel MKL-DNN, Docker, NVIDIA-Docker & EFA support. For fully managed experience, check: https://aws.amazon.com/sagemaker
AWS EC2 instance - g4dn.4xlarge - 16GB VRAM, 64 GB RAM

Side Note

ERROR GB VRAM sizes are NOT 30GB or 15GB
- 1 539 047 424 = 1.5 GB,
- 3 091 258 960 = 3 GB,
- 3 015 442 432 = 3GB,
- 3 091 258 960 = 3 GB.
- The GPU has 16 GB VRAM, so I don't think that this answers the question.

ERROR

---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
<ipython-input-25-7bd66d4fabf4> in <module>
      1 #train = xgboost.DMatrix(data=X, label=y) #ORIGINAL
----> 2 test_data = xgboost.DMatrix(data=X_test, label=y_test)

~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/core.py in __init__(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread, enable_categorical)
    448             feature_names=feature_names,
    449             feature_types=feature_types,
--> 450             enable_categorical=enable_categorical)
    451         assert handle is not None
    452         self.handle = handle

~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/data.py in dispatch_data_backend(data, missing, threads, feature_names, feature_types, enable_categorical)
    543     if _is_cudf_df(data):
    544         return _from_cudf_df(data, missing, threads, feature_names,
--> 545                              feature_types)
    546     if _is_cudf_ser(data):
    547         return _from_cudf_df(data, missing, threads, feature_names,

~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/data.py in _from_cudf_df(data, missing, nthread, feature_names, feature_types)
    400             ctypes.c_float(missing),
    401             ctypes.c_int(nthread),
--> 402             ctypes.byref(handle)))
    403     return handle, feature_names, feature_types
    404 

~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/core.py in _check_call(ret)
    184     """
    185     if ret != 0:
--> 186         raise XGBoostError(py_str(_LIB.XGBGetLastError()))
    187 
    188 

XGBoostError: [12:32:18] /opt/conda/envs/rapids/conda-bld/xgboost_1603491651651/work/src/c_api/../data/../common/device_helpers.cuh:400: Memory allocation error on worker 0: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:68: cudaErrorMemoryAllocation out of memory
- Free memory: 1539047424
- Requested memory: 3091258960

Stack trace:
  [bt] (0) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(+0x13674f) [0x7fad04f7274f]
  [bt] (1) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(dh::detail::ThrowOOMError(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long)+0x3ad) [0x7fad05190b0d]
  [bt] (2) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(dh::detail::XGBDefaultDeviceAllocatorImpl<xgboost::Entry>::allocate(unsigned long)+0x1df) [0x7fad051ac11f]
  [bt] (3) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(thrust::detail::vector_base<xgboost::Entry, dh::detail::XGBDefaultDeviceAllocatorImpl<xgboost::Entry> >::fill_insert(thrust::detail::normal_iterator<thrust::device_ptr<xgboost::Entry> >, unsigned long, xgboost::Entry const&)+0x26d) [0x7fad051d0d0d]
  [bt] (4) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::HostDeviceVector<xgboost::Entry>::Resize(unsigned long, xgboost::Entry)+0xc9) [0x7fad051d1cc9]
  [bt] (5) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::CudfAdapter>(xgboost::data::CudfAdapter*, float, int)+0x3df) [0x7fad052259cf]
  [bt] (6) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::CudfAdapter>(xgboost::data::CudfAdapter*, float, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long)+0x133) [0x7fad051f3aa3]
  [bt] (7) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(XGDMatrixCreateFromArrayInterfaceColumns+0xc6) [0x7fad0518c286]
  [bt] (8) /home/ubuntu/anaconda3/envs/rapids/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fae60078630]

CODE 2 If I clean a out a restart the notebook that execute them together in 1 cell.

train_data = xgboost.DMatrix(data=X_train, label=y_train) 
test_data = xgboost.DMatrix(data=X_test, label=y_test)

ERROR 2

---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
<ipython-input-20-f0c3710678a8> in <module>
      1 #train = xgboost.DMatrix(data=X, label=y) #ORIGINAL
      2 train_data = xgboost.DMatrix(data=X_train, label=y_train)
----> 3 test_data = xgboost.DMatrix(data=X_test, label=y_test)

~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/core.py in __init__(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread, enable_categorical)
    448             feature_names=feature_names,
    449             feature_types=feature_types,
--> 450             enable_categorical=enable_categorical)
    451         assert handle is not None
    452         self.handle = handle

~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/data.py in dispatch_data_backend(data, missing, threads, feature_names, feature_types, enable_categorical)
    543     if _is_cudf_df(data):
    544         return _from_cudf_df(data, missing, threads, feature_names,
--> 545                              feature_types)
    546     if _is_cudf_ser(data):
    547         return _from_cudf_df(data, missing, threads, feature_names,

~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/data.py in _from_cudf_df(data, missing, nthread, feature_names, feature_types)
    400             ctypes.c_float(missing),
    401             ctypes.c_int(nthread),
--> 402             ctypes.byref(handle)))
    403     return handle, feature_names, feature_types
    404 

~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/core.py in _check_call(ret)
    184     """
    185     if ret != 0:
--> 186         raise XGBoostError(py_str(_LIB.XGBGetLastError()))
    187 
    188 

XGBoostError: [15:20:36] /opt/conda/envs/rapids/conda-bld/xgboost_1603491651651/work/src/c_api/../data/../common/device_helpers.cuh:400: Memory allocation error on worker 0: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:68: cudaErrorMemoryAllocation out of memory
- Free memory: 3015442432
- Requested memory: 3091258960

Stack trace:
  [bt] (0) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(+0x13674f) [0x7f7eea73674f]
  [bt] (1) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(dh::detail::ThrowOOMError(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long)+0x3ad) [0x7f7eea954b0d]
  [bt] (2) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(dh::detail::XGBDefaultDeviceAllocatorImpl<xgboost::Entry>::allocate(unsigned long)+0x1df) [0x7f7eea97011f]
  [bt] (3) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(thrust::detail::vector_base<xgboost::Entry, dh::detail::XGBDefaultDeviceAllocatorImpl<xgboost::Entry> >::fill_insert(thrust::detail::normal_iterator<thrust::device_ptr<xgboost::Entry> >, unsigned long, xgboost::Entry const&)+0x26d) [0x7f7eea994d0d]
  [bt] (4) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::HostDeviceVector<xgboost::Entry>::Resize(unsigned long, xgboost::Entry)+0xc9) [0x7f7eea995cc9]
  [bt] (5) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::CudfAdapter>(xgboost::data::CudfAdapter*, float, int)+0x3df) [0x7f7eea9e99cf]
  [bt] (6) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::CudfAdapter>(xgboost::data::CudfAdapter*, float, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long)+0x133) [0x7f7eea9b7aa3]
  [bt] (7) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(XGDMatrixCreateFromArrayInterfaceColumns+0xc6) [0x7f7eea950286]
  [bt] (8) /home/ubuntu/anaconda3/envs/rapids/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f8044f8d630]

Solution

as per this part of your error,

XGBoostError: [12:32:18] /opt/conda/envs/rapids/conda-bld/xgboost_1603491651651/work/src/c_api/../data/../common/device_helpers.cuh:400: Memory allocation error on worker 0: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:68: cudaErrorMemoryAllocation out of memory
- Free memory: 1539047424
- Requested memory: 3091258960

Your GPU memory isn't big enough for this particular single GPU notebook.
The easiest solution is to use a p3 instance to get the 32GB GPU (or p4dn if you want to try the A100s @ 40GB)

If you need to use a T4 on the g4 instances for some reason, or just want to get more practice in dask-cudf, a bit more effort on your part is required. You can:

use a multi-gpu g4dn.12xlarge and apply dask-cudf and set up your dask cluster instead of using the single GPU cudf with xgboost.dask for multi-gpu boosting. then it will work on your 16GB T4s
try the same dask-cudf and xboost.dask with the smaller single GPU g4 instances.

A multi-GPU version would be an awesome community contribution.

If not, just use the p3 instance. I've made an issue and we'll add a warning in a future notebook-contrib PR. Thanks for making us aware of this!