Search code examples
pythonpandasnullablefillna

What is the 'fillna()' euiqvalent for dtype 'Int32'?


Short question: How can I set all values that are <1 or <NA> to 1?

Long question: Say I have a pure-int (int32!) pandas column, I used can do this to cap the minimum:

>>> shots = pd.DataFrame([2, 0, 1], index=['foo', 'bar', 'baz'], columns={'shots'}, dtype='int32')
shots
     shots
foo      2
bar      0
baz      1

>>> max(shots.loc['foo', 'shots'], 1)
2

>>> max(shots.loc['bar', 'shots'], 1)
1

So far, so good. Now, say the dtype of column shots changes from 'int32' to Int32, allowing <NA>. This gets me in trouble when accessing <NA> records. I get this error:

>>> shots = pd.DataFrame([2, np.nan, 1], index=['foo', 'bar', 'baz'], columns={'shots'}, dtype='Int32')
     shots
foo      2
bar   <NA>
baz      1

>>> max(shots.loc['bar', 'shots'], 1)    
`TypeError: boolean value of NA is ambiguous`

What should I do?

My first intuition was to say "Ok, let's fill values, then apply max().". But that also fails:

>>> shots.loc[idx, 'shots'].fillna(1)

AttributeError: 'NAType' object has no attribute 'fillna'

--> What is the most pandiastic/pydantic way to apply a condition to <NA> values, i.e., setting all <NA> to 1, or applying some other form of basic match, such as max(<NA>, 1)?

Versions

  • Python 3.8.6
  • Pandas 1.2.3
  • Numpy 1.19.2

Solution

  • idx should be a collection else if it's a scalar you get a scalar value:

    # idx = 'bar'
    
    >>> shots.loc[idx, 'shots']
    <NA>
    
    >>> shots.loc[idx, 'shots'].fillna(1)
    ...
    AttributeError: 'NAType' object has no attribute 'fillna'
    
    >>> shots.loc[[idx], 'shots'].fillna(1)
    bar    1
    Name: shots, dtype: Int32
    

    The question is how idx is defined?


    Old answer

    Your problem is not reproducible for me.

    shots = pd.DataFrame({'shots': [2, 1, pd.NA]}, dtype=pd.Int32Dtype())
    idx = [2]
    
    >>> shots
       shots
    0      2
    1      1
    2   <NA>
    
    >>> shots.dtypes
    shots    Int32
    dtype: object
    
    >>> shots.loc[idx, 'shots'].fillna(1)
    2    1
    Name: shots, dtype: Int32
    

    Versions:

    • Python 3.9.7
    • Pandas 1.4.1
    • Numpy 1.21.5