Search code examples
pythonpandasdata-cleaning

converting object types columns into numeric type using pandas


I am trying to clean the data using pandas. When I execute df.datatypes it shows that the columns are of type objects. I wish to convert them into numeric types. I tried various ways of doing so like;

data[['a','b']] = data[['a','b']].apply(pd.to_numeric, errors ='ignore')

Then,

data['c'] = data['c'].infer_objects()

But nothing seems to be working. The interpreter does not throw any error but at the same time, it does not performs the desired conversion.

Any help will be greatly appreciated.

Thanking in advance.


Solution

  • From the help page of to_numeric, the description for errors is as follows:

    errors : {'ignore', 'raise', 'coerce'}, default 'raise'
            - If 'raise', then invalid parsing will raise an exception
            - If 'coerce', then invalid parsing will be set as NaN
            - If 'ignore', then invalid parsing will return the input
    

    If your apply returns your input without doing anything to it, then the reason is because you've non-convertible objects, and calling to_numeric with errors='ignore' isn't helping.

    Try using the second option, errors='coerce'.

    df = df.apply(pd.to_numeric, errors='coerce')
    

    Or,

    for c in df.columns:
        df[c] = pd.to_numeric(df[c], errors='coerce')
    

    Also, infer_objects performs soft type-casting. If you want to check column dtypes, use df.dtypes instead.