Replacing strings by numerical values used to be easy, but since pandas 2.2. the simple approach below throws a warning. What is the "correct" way to do this now?
>>> s = pd.Series(["some", "none", "all", "some"])
>>> s.dtypes
dtype('O')
>>> s.replace({"none": 0, "some": 1, "all": 2})
FutureWarning: Downcasting behavior in `replace` is deprecated and will be
removed in a future version. To retain the old behavior, explicitly call
`result.infer_objects(copy=False)`. To opt-in to the future behavior, set
`pd.set_option('future.no_silent_downcasting', True)`
0 1
1 0
2 2
3 1
dtype: int64
If I understand the warning correctly, the object dtype is "downcast" to int64. Perhaps pandas wants me to do this explicitly, but I don't see how I could downcast a string to a numerical type before the replacement happens.
When you run:
s.replace({"none": 0, "some": 1, "all": 2})
The dtype of the output is currently int64
, as pandas inferred that the values are all integers.
print(s.replace({"none": 0, "some": 1, "all": 2}).dtype) # int64
In a future pandas version this won't happens anymore automatically, the dtype will remain object
(you will still have integers but as objects, not int64):
pd.set_option('future.no_silent_downcasting', True)
print(s.replace({"none": 0, "some": 1, "all": 2}).dtype) # object
You will have to explicitly downcast the objects to integers (after the replacement):
s.replace({"none": 0, "some": 1, "all": 2}).infer_objects(copy=False)
print(s.replace({"none": 0, "some": 1, "all": 2})
.infer_objects(copy=False).dtype) # int64