Search code examples
pythonmachine-learningrapidscudf

AttributeError: 'cupy.core.core.ndarray' object has no attribute 'iloc'


i am trying to split data into training and validation data, for this i am using train_test_split from cuml.preprocessing.model_selection module.

but got an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-317-4e11838456ea> in <module>
----> 1 X_train, X_test, y_train, y_test = train_test_split(train_dfIF,train_y, test_size=0.20, random_state=42)

/opt/conda/lib/python3.7/site-packages/cuml/preprocessing/model_selection.py in train_test_split(X, y, test_size, train_size, shuffle, random_state, seed, stratify)
    454         X_train = X.iloc[0:train_size]
    455         if y is not None:
--> 456             y_train = y.iloc[0:train_size]
    457 
    458     if hasattr(X, "__cuda_array_interface__") or \

AttributeError: 'cupy.core.core.ndarray' object has no attribute 'iloc'

Although i am not using iloc.

here is code:

from cuml.preprocessing.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(train_dfIF,train_y, test_size=0.20, random_state=42)

here train_dfIF is a cudf DataFrame and train_y is cupy array.


Solution

  • You cannot (currently) pass an array to the y parameter if your X parameter is a dataframe. I would recommend passing two dataframes or two arrays, not one of each.

    from cuml.preprocessing.model_selection import train_test_split
    import cudf
    import cupy as cp
    
    df = cudf.DataFrame({
        "a":range(5),
        "b":range(5)
    })
    y = cudf.Series(range(5))
    
    # train_test_split(df, y.values, test_size=0.20, random_state=42) # fail
    X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.20, random_state=42) # succeed
    X_train, X_test, y_train, y_test = train_test_split(df.values, y.values, test_size=0.20, random_state=42) # succeed