Search code examples
pythonpandasdataframenancudf

Replace integers with np.NaN in cudf dataframe


I have a dataframe like this

df_a = cudf.DataFrame()
df_a['key'] = [0, 1, 2, 3, 4]
df_a['values'] = [1,2,np.nan,3,np.nan]

and I would like to replace all 2s with np.nan

usually in pandas dataframe I would use df_a[df_a==2]=np.nan

but in cudf dataframe I get cannot broadcast <class 'int'>

when I use df_a[df_a['values']==2] =np.nan I cannot make sense of the result

using df_a.replace(2, np.NaN)

gives me cannot convert float NaN to integer

The original dataframe is very large so I want to avoid loops and it may contain different datatypes, meaning '2's coul also be floats


Solution

  • I can't find a good reference for this, but using None instead of np.nan seems to do the trick:

    from cudf import DataFrame
    from numpy import nan
    
    df_a = DataFrame()
    df_a['key'] = [0, 1, 2, 3, 4]
    df_a['values'] = [1,2, nan,3,nan]
    print(df_a)
    #    key values
    # 0    0      1
    # 1    1      2
    # 2    2   <NA>
    # 3    3      3
    # 4    4   <NA>
    
    # mask all 2's (in key and value)
    mask = df_a==2
    df_a[mask] = None
    print(df_a)
    #     key values
    # 0     0      1
    # 1     1   <NA>
    # 2  <NA>   <NA>
    # 3     3      3
    # 4     4   <NA>