I have a dataframe column which appears as follows and has the following characteristics:
>>> df.dtypes
location object
sensor_1 object
sensor_2 float64
>>> df['sensor_1'].head(4)
0 3 m3/h
1 NaN
2 NaN
3 NaN
Name: sensor_1, dtype: object
>>> type(df['sensor_1'][0])
str
>>> type(df['sensor_1'][1])
float
My goal is to keep the numeric part and recognise it as float
from "sensor_1", taking into consideration the fact that there Nulls
which are recognised already as numeric, as I understand.
I tried a few things which did not work:
pd.to_numeric(df['sensor_1'], errors='coerce') #it did not change anything
df['sensor_1'].apply(lambda x: x.str[:-5].astype(float) if pd.notnull(x) else x)
#tried to strip the last 5 characters if not null and then convert the remaining part to float
AttributeError: 'str' object has no attribute 'str'
df['sensor_1'].to_string() #unsure how to go on from there
So... running out of ideas really and asking for your help. Thank you ^_^
Use Series.str.extract
, but first convert values to string
s and last to floats
:
df['sensor_1'] = (df['sensor_1'].astype(str)
.str.extract('((\d+\.*\d*))', expand=False)
.astype(float))