Search code examples
pythonpandasdataframe

Why doesn't fillna work as expected in pandas version 2.1.4?


This is my DataFrame:

import pandas as pd 
df = pd.DataFrame(
    {
        'a': ['long', 'long', 'short', 'long', 'short', 'short', 'short'],
        'b': [1, -1, 1, 1, -1, -1, 1],
    }
)

Expected output is creating column a_1:

        a    b       a_1
0     long   1       long
1     long  -1       long
2    short   1      short
3     long   1       long
4    short  -1       long
5    short  -1       long
6    short   1      short

Logic:

a_1 should be created like this:

df.loc[df.b.eq(-1), 'a_1'] = 'long'
df['a_1'] = df.a_1.fillna(df.a)

This problem is really weird. When I try fillna it does not work. I tried it with pandas version 1.2.4 and it worked but with version 2.1.4 it does not work. This version is default version of Colab currently and I ran this code on Colab.


Solution

  • This appears to be caused by 2.1.4 generating NaNs as ‘nan’ when creating columns that are strings with only partial values. Whatever the cause, it is not recommended by Pandas to continuously update values that match a conditional statement. Pandas' mask function is customised for this situation, so use it.

    df['a_1'] = df['a'].mask(df['b'].eq(-1), 'long')