How to apply pandas.to_numeric
to a subset of DataFrame selected using .loc[]
? E.g. consider this DataFrame:
df = pd.DataFrame(index=pd.Index([1, 2, 3]))
df['X'] = ['a', 'a', 'b']
df['Y'] = [1, 2, 3]
df['Z'] = [4, 5, 6]
df['Y'] = df['Y'].astype(object)
df['Z'] = df['Z'].astype(object)
df
X Y Z
1 a 1 4
2 a 2 5
3 b 3 6
Notice that type of Y and Z columns is object
.
I would like to apply pandas.to_numeric
on columns Y and Z to change the data type to int
. I tested 3 approaches:
df.loc[:, 'Y'] = df.loc[:, 'Y'].apply(pd.to_numeric) # (1) WORKS
df.loc[:, 'Z'] = df.loc[:, 'Z'].apply(pd.to_numeric) # (1) WORKS
df.loc[:, ['Y', 'Z']] = df.loc[:, ['Y', 'Z']].apply(pd.to_numeric) # (2) DOESN'T WORK
df.loc[:, 'Y':'Z'] = df.loc[:, 'Y':'Z'].apply(pd.to_numeric) # (3) DOESN'T WORK
Approaches (3) and (4) doesn't work with pd.to_numeric
, but work with other functions, e.g.
df.loc[:, 'Y':'Z'] = df.loc[:, 'Y':'Z'].apply(lambda x: x*0)
correctly sets Y and Z columns to zero. Can someone explain why it does not work with pandas.to_numeric
?
EDIT
Finally, it turns out that this behavior is intended, as there is a difference between .loc[:, ...]
and []
. According to the documentation:
Note: When trying to convert a subset of columns to a specified type using
astype()
andloc()
, upcasting occurs.loc()
tries to fit in what we are assigning to the current dtypes, while[]
will overwrite them taking the dtype from the right hand side.
Therefore the type should be changed using []
as suggested in the answer of jezrael. More info in the documentation.
It seems like bugs.
For me working:
df[['Y', 'Z']] = df[['Y', 'Z']].apply(pd.to_numeric)
print (df.dtypes)
X object
Y int64
Z int64
dtype: object