Short question: How can I set all values that are <1 or <NA>
to 1?
Long question: Say I have a pure-int (int32
!) pandas column, I used can do this to cap the minimum:
>>> shots = pd.DataFrame([2, 0, 1], index=['foo', 'bar', 'baz'], columns={'shots'}, dtype='int32')
shots
shots
foo 2
bar 0
baz 1
>>> max(shots.loc['foo', 'shots'], 1)
2
>>> max(shots.loc['bar', 'shots'], 1)
1
So far, so good. Now, say the dtype of column shots
changes from 'int32' to Int32
, allowing <NA>
. This gets me in trouble when accessing <NA>
records. I get this error:
>>> shots = pd.DataFrame([2, np.nan, 1], index=['foo', 'bar', 'baz'], columns={'shots'}, dtype='Int32')
shots
foo 2
bar <NA>
baz 1
>>> max(shots.loc['bar', 'shots'], 1)
`TypeError: boolean value of NA is ambiguous`
What should I do?
My first intuition was to say "Ok, let's fill values, then apply max().". But that also fails:
>>> shots.loc[idx, 'shots'].fillna(1)
AttributeError: 'NAType' object has no attribute 'fillna'
--> What is the most pandiastic/pydantic way to apply a condition to <NA>
values, i.e., setting all <NA>
to 1, or applying some other form of basic match, such as max(<NA>, 1)
?
Versions
idx
should be a collection else if it's a scalar you get a scalar value:
# idx = 'bar'
>>> shots.loc[idx, 'shots']
<NA>
>>> shots.loc[idx, 'shots'].fillna(1)
...
AttributeError: 'NAType' object has no attribute 'fillna'
>>> shots.loc[[idx], 'shots'].fillna(1)
bar 1
Name: shots, dtype: Int32
The question is how idx
is defined?
Old answer
Your problem is not reproducible for me.
shots = pd.DataFrame({'shots': [2, 1, pd.NA]}, dtype=pd.Int32Dtype())
idx = [2]
>>> shots
shots
0 2
1 1
2 <NA>
>>> shots.dtypes
shots Int32
dtype: object
>>> shots.loc[idx, 'shots'].fillna(1)
2 1
Name: shots, dtype: Int32
Versions: