Search code examples
pythonregexpandasstringtext-processing

Locate all non-number elements in a pandas.Series


For a pd.Series with mixed strings and numbers (integers and floats), I need to identify all non-number elements. For example

data = pd.Series(['1','wrong value','2.5','-3000','>=50','not applicable', '<40.5'])

I want it to return the following elements:

wrong value
>=50
not applicable
<40.5

What I'm currently doing is:

data[~data.str.replace(r'[\.\-]','').str.isnumeric()]

That is, because .str.isnumeric() will give False to decimal points and negative signs, I had to mask "." and "-" first and then find out the non-numeric fields.

Is there a better way of doing this? Or is there any potential problem/warning with my current method? Thanks!!


Solution

  • Use pd.to_numeric to flag them

    data[pd.to_numeric(data, errors='coerce').isna()]
    
    Out[1159]:
    1       wrong value
    4              >=50
    5    not applicable
    6             <40.5
    dtype: object