Search code examples
pythonpandasdataframeerror-handlingdeprecation-warning

How to convert a pandas dataframe to numeric future proof?


Until now I used to convert all values in a pandas dataframe with combined numerical and string entries to numeric values if possible in one easy step, using .map and .to_numeric with "errors = 'ignore'".

It worked perfectly, but after updating to the latest version of Pandas (2.2.3) I get a FutureWarning.

import pandas as pd
A = pd.DataFrame({
    'x' : ['1','2','3'],
    'y' : ['not_a_number','5',9999],
    }) # example data
B = A.map(pd.to_numeric, errors = 'ignore')

# FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing errors and catch exceptions explicitly instead B = A.map(pd.to_numeric, errors = 'ignore')

How could I code this future proof in an elegant, vectorised way?

I could not think of any solution that is not very cumbersome (looping over each individual entry of the dataframe).


Solution

  • When you use errors='ignore', to_numeric returns the original Series.

    As mentioned in the documentation:

    errors {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’

    • If ‘raise’, then invalid parsing will raise an exception.
    • If ‘coerce’, then invalid parsing will be set as NaN.
    • If ‘ignore’, then invalid parsing will return the input.

    Changed in version 2.2. “ignore” is deprecated. Catch exceptions explicitly instead.

    Catch the error explicitly if you want to keep the previous behavior:

    def to_numeric(s):
        try:
            return pd.to_numeric(s, errors='raise')
        except ValueError:
            return s
        
    A.apply(to_numeric)
    

    NB. use apply rather than map for a vectorial operation.

    Relevant issues: #54467, #59221.