Search code examples
pythonpandaspandas-loc

Apply pandas.to_numeric to selected subset of columns using loc in pandas DataFrame


How to apply pandas.to_numeric to a subset of DataFrame selected using .loc[]? E.g. consider this DataFrame:

df = pd.DataFrame(index=pd.Index([1, 2, 3]))
df['X'] = ['a', 'a', 'b']
df['Y'] = [1, 2, 3]
df['Z'] = [4, 5, 6]
df['Y'] = df['Y'].astype(object)
df['Z'] = df['Z'].astype(object)
df
    X   Y   Z
1   a   1   4
2   a   2   5
3   b   3   6

Notice that type of Y and Z columns is object. I would like to apply pandas.to_numeric on columns Y and Z to change the data type to int. I tested 3 approaches:

df.loc[:, 'Y'] = df.loc[:, 'Y'].apply(pd.to_numeric) # (1) WORKS
df.loc[:, 'Z'] = df.loc[:, 'Z'].apply(pd.to_numeric) # (1) WORKS

df.loc[:, ['Y', 'Z']] = df.loc[:, ['Y', 'Z']].apply(pd.to_numeric) # (2) DOESN'T WORK

df.loc[:, 'Y':'Z'] = df.loc[:, 'Y':'Z'].apply(pd.to_numeric) # (3) DOESN'T WORK

Approaches (3) and (4) doesn't work with pd.to_numeric, but work with other functions, e.g.

df.loc[:, 'Y':'Z'] = df.loc[:, 'Y':'Z'].apply(lambda x: x*0)

correctly sets Y and Z columns to zero. Can someone explain why it does not work with pandas.to_numeric?

EDIT

Finally, it turns out that this behavior is intended, as there is a difference between .loc[:, ...] and []. According to the documentation:

Note: When trying to convert a subset of columns to a specified type using astype() and loc(), upcasting occurs. loc() tries to fit in what we are assigning to the current dtypes, while [] will overwrite them taking the dtype from the right hand side.

Therefore the type should be changed using [] as suggested in the answer of jezrael. More info in the documentation.


Solution

  • It seems like bugs.

    For me working:

    df[['Y', 'Z']] = df[['Y', 'Z']].apply(pd.to_numeric)
    print (df.dtypes)
    X    object
    Y     int64
    Z     int64
    dtype: object