Search code examples
pythonpandasfillna

Drop dataframe rows with values that are an array of NaN


I have a dataframe where in one column, I've ended up with some values that are not merely "NaN" but an array of NaNs (ie, "[nan, nan, nan]")

I want to change those values to 0. If it were simply "nan" I would use:

df.fillna(0)

But that doesn't work in this instance.

For instance if:

df1 = pd.DataFrame({
         'ID':[1,2,3,4,5,6],
         'Version':[1,1,2,2,1,2],
         'Cost':[17,np.nan,24,[np.nan, np.nan, np.nan],13,8]})

Using df1.fillna(0) yields:

   ID  Version             Cost
0   1        1               17
1   2        1                0
2   3        2               24
3   4        2  [nan, nan, nan]
4   5        1               13
5   6        2                8

When I'd like to get the output:

   ID  Version             Cost
0   1        1               17
1   2        1                0
2   3        2               24
3   4        2                0
4   5        1               13
5   6        2                8

Solution

  • In your case column Cost is an object so you can first convert to numeric and then fillna.

    import pandas as pd
    
    df = pd.DataFrame({"ID":list(range(1,7)),
                       "Version":[1,1,2,2,1,2],
                       "Cost": [17,0,24,['nan', 'nan', 'nan'], 13, 8]})
    

    Where df.dtypes

    ID          int64
    Version     int64
    Cost       object
    dtype: object
    

    So you can convert this columns to_numeric using errors='coerce' which means that assign a np.nan if conversion is not possible.

    df["Cost"] = pd.to_numeric(df["Cost"], errors='coerce')\
                   .fillna(0)
    

    or if you prefer in two steps

    df["Cost"] = pd.to_numeric(df["Cost"], errors='coerce')
    df["Cost"] = df["Cost"].fillna(0)