I have a Pandas series sliced from a DF. The series has about 100000 rows where some of the values are Float type and the others are infinity. But python specifies the whole series as an 'Object' type. So when I try to remove non-numeric value, the whole series becomes NaN.
Below is an example of the how the table is structured and all types of non-numeric enteries. °° This symbol specifies inifinity.
Time (µs) | ChannelA (mV) | ChannelB (mV) | ChannelC (mV) | ChannelD (mV) |
---|---|---|---|---|
1 | 0.1 | 0.2 | 0.3 | 0.4 |
2 | 0.5 | 0.7 | 0.4 | 0.5 |
3 | 0.6 | 0.2 | 0.3 | 0.11 |
4 | 0.8 | 0.6 | 0.7 | 0.6 |
5 | °° | °° | °° | °° |
Why does python specify the whole series as an Object ?
You can convert each columns to numeric using pd.to_numeric
, and passing errors
as coerce
which will convert non-numeric string values to NaN
.
for c in df:
df[c] = pd.to_numeric(df[c], errors='coerce')
OUTPUT:
df
Time (µs) ChannelA (mV) ChannelB (mV) ChannelC (mV) ChannelD (mV)
0 1 0.1 0.2 0.3 0.40
1 2 0.5 0.7 0.4 0.50
2 3 0.6 0.2 0.3 0.11
3 4 0.8 0.6 0.7 0.60
4 5 NaN NaN NaN NaN
Data types after conversion:
>>> df.dtypes:
Time (µs) int64
ChannelA (mV) float64
ChannelB (mV) float64
ChannelC (mV) float64
ChannelD (mV) float64
dtype: object
You can also replace those °°
characters by inf
which represents infinite value if you don't want to use NaN
, using replace
, then you can convert the dataframe to float
.
df.replace('°°', float('inf')).astype(float)
Time (µs) ChannelA (mV) ChannelB (mV) ChannelC (mV) ChannelD (mV)
0 1.0 0.1 0.2 0.3 0.40
1 2.0 0.5 0.7 0.4 0.50
2 3.0 0.6 0.2 0.3 0.11
3 4.0 0.8 0.6 0.7 0.60
4 5.0 inf inf inf inf