Search code examples
pythonpandasdata-cleaning

Pandas - Remove strings from a float number in a column


I have a dataframe like the following:

plan type  hour status     code
A    cont   0    ok       010.0
A    cont   2    ok      025GWA
A    cont   0    notok   010VVT
A    cont   0    other     6.05
A    vend   1    ok        6.01

The column code has a few string characters with different letters. In the end I would like to transform the 'code' column to float. I tried:

df['code'] = df['code'].str.extract('(\d+)').astype(float)

but with this I got:

plan type  hour status     code
A    cont   0    ok        10.0
A    cont   2    ok        25.0 
A    cont   0    notok     10.0
A    cont   0    other      6.0
A    vend   1    ok         6.0

How can I get a result like the following?

plan type  hour status     code
A    cont   0    ok       10.00
A    cont   2    ok       25.00
A    cont   0    notok    10.00
A    cont   0    other     6.05
A    vend   1    ok        6.01

Solution

  • Use (\d*\.?\d*)

    In [441]: df['code'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
    Out[441]:
    0    10.00
    1    25.00
    2    10.00
    3     6.05
    4     6.01
    Name: code, dtype: float64