For a pd.Series with mixed strings and numbers (integers and floats), I need to identify all non-number elements. For example
data = pd.Series(['1','wrong value','2.5','-3000','>=50','not applicable', '<40.5'])
I want it to return the following elements:
wrong value
>=50
not applicable
<40.5
What I'm currently doing is:
data[~data.str.replace(r'[\.\-]','').str.isnumeric()]
That is, because .str.isnumeric()
will give False
to decimal points and negative signs, I had to mask "." and "-" first and then find out the non-numeric fields.
Is there a better way of doing this? Or is there any potential problem/warning with my current method? Thanks!!
Use pd.to_numeric
to flag them
data[pd.to_numeric(data, errors='coerce').isna()]
Out[1159]:
1 wrong value
4 >=50
5 not applicable
6 <40.5
dtype: object