I want to convert a cuDF dataframe to cupy ndarray. I'm using this code below:
import time
import numpy as np
import cupy as cp
import cudf
from numba import cuda
df = cudf.read_csv('titanic.csv')
arr_cupy = cp.fromDlpack(df.to_dlpack())
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-176-0d6ff9785189> in <module>
----> 1 arr_cupy = cp.fromDlpack(df.to_dlpack())
~/.conda/envs/rapids_013/lib/python3.7/site-packages/cudf/core/dataframe.py in to_dlpack(self)
3821 import cudf.io.dlpack as dlpack
3822
-> 3823 return dlpack.to_dlpack(self)
3824
3825 @ioutils.doc_to_csv()
~/.conda/envs/rapids_013/lib/python3.7/site-packages/cudf/io/dlpack.py in to_dlpack(cudf_obj)
72 )
73
---> 74 return libdlpack.to_dlpack(gdf_cols)
cudf/_libxx/dlpack.pyx in cudf._libxx.dlpack.to_dlpack()
ValueError: Cannot create a DLPack tensor with null values. Input is required to have null count as zero.
I'm getting this error because dataset have nullvalues. How can I do this??
Let's cover your two issues :)
From cudf df to cupy ndarray: You can use as_gpu_matrix
and cast it to a cupy array as below. This keeps it all on the GPU as is pretty efficient.
arr_cupy = cp.array(df.as_gpu_matrix())
https://docs.rapids.ai/api/cudf/stable/api_docs/api/cudf.DataFrame.as_gpu_matrix.html
In the future (or even present that I'm yet not aware of), there may be a more direct way. If for some reason you need DLPack, okay, your way works. That brings us to the second issue...
Null Values: to fill in your null values, you should use .fillna()
. Use a value you you can tell is out of place.
https://docs.rapids.ai/api/cudf/stable/api_docs/api/cudf.DataFrame.fillna.html
Together, they can look like this:
arr_cupy = cp.array(df.fillna(-1).to_gpu_matrix())
Output type is cupy.core.core.ndarray
Output array from my test df is:
array([[ 0, 17444256, 1200],
[ 1, 616285571, 987],
[ 2, -1, 407],
...,
where -1
is the null i artificially created
Hope that helps!