Search code examples
rapidscudf

Replace values in Column C where value in Column A is x


Issue

In process of replacing null values so column is boolean, we find null values in fireplace_count column.

If fireplaceflag value is False the fireplace_count null value should be replaced with 0

written for pandas

df_train.loc[(df_train.fireplace_count.isnull()) & (df_train.fireplaceflag==False),'fireplace_count'] = 0

Solution

  • I suggest using df.fillna() and putting the column name in the method to target it, like:

    df['<column_name>']=df.<columnname>.fillna(<new_value>)

    You would put the new value that you want to change the null values into in the parenthesis. In your case, this is "0". Let's also simplify the problem, as it seems the condition for a None value is if there is a False flag.

    I'm going to use the Series that you sent me earlier, with one minor change.

    import cudf
    df = cudf.DataFrame({'basement_flag': [1, 1, 1, 0],
                         'basementsqft': [400,750,500,0],
                         'fireplace_count': [2, None, None, 1], #<-- added a None to illustrate the targeted nature of the solution
                         'fireplaceflag': [10, None, None, 8]})
    print(df)
    df['fireplace_count']=df.fireplace_count.fillna(0) #<-- This is the solution.  It changes only the values in the column of interest, which is what you explained that you needed
    print(df)
    

    Output would be:

       basement_flag  basementsqft  fireplace_count  fireplaceflag
    0              1           400                2             10
    1              1           750                                
    2              1           500                                
    3              0             0                1              8
       basement_flag  basementsqft  fireplace_count  fireplaceflag
    0              1           400                2             10
    1              1           750                0               
    2              1           500                0               
    3              0             0                1              8
    

    there is also...

    df['fireplace_count'] = df['fireplace_count'].fillna(0)
    df['fireplaceflag']= df['fireplaceflag'].fillna(-1)
    df['fireplaceflag'] = df['fireplaceflag'].masked_assign(1, df['fireplace_count'] > 0)
    

    That should work for any weird cases based on what i think your question is (Thanks Roy F @ NVIDIA)

    Let me know if this works for you, or if you need more help!