Search code examples
rapidscudf

'nvstrings' object has no attribute 'to_gpu_array'


I'm using cuML for stochastic gradient descent. I used sklearn's train_test_split to generate the splits for train_X, train_y ... from a cuDF dataframe.

The following code (I removed the hyperparameters which aren't relevant to this question):

from cuml.solvers import SGD as cumlSGD
cu_sgd = cumlSGD(eta0=0.005)
cu_sgd.fit(train_X, train_y)

Throws the following error on the cu_sgd.fit line: 'nvstrings' object has no attribute 'to_gpu_array'

How can I get around this issue?


Solution

  • The solution is to first convert any column in train_X or train_Y that have the string dtype to category dtype. Strings can't be converted with to_gpu_array because they are not fixed-width. You'll lose the actual string values, but they can be reconstructed, and cu_sgd.fit should work fine.