I have a panda's dataframe with strings, but I want floats. The units on the strings are different though. One column might be in mm with a few in cm. The next column might be g with a few in kg, and so on.
How can I convert each to floats in the same units? (i.e. if I'm changing everything to floats in mm, then "4.56 cm" should convert to 45.6, not 4.56).
Example
we need minimal and reproducible example to answer.
import pandas as pd
import numpy as np
data = {'length': ['4.56 cm', '100 mm', '3.5 cm', '10 mm'],
'weight': ['200 g', '0.5 kg', '300 g', '2 kg']}
df = pd.DataFrame(data)
df:
length weight
0 4.56 cm 200 g
1 100 mm 0.5 kg
2 3.5 cm 300 g
3 10 mm 2 kg
Code
I have prepared an answer with a example, but if your situation is different, please create and provide your own clear example.
s1 = df['length'].str.replace('[A-Za-z]', '', regex=True).astype('float')
df['length(mm)'] = np.where(df['length'].str.contains('cm'), s1 * 10, s1)
s2 = df['weight'].str.replace('[A-Za-z]', '', regex=True).astype('float')
df['weight(g)'] = np.where(df['weight'].str.contains('kg'), s2 * 1000, s2)
df:
length weight length(mm) weight(g)
0 4.56 cm 200 g 45.6 200.0
1 100 mm 0.5 kg 100.0 500.0
2 3.5 cm 300 g 35.0 300.0
3 10 mm 2 kg 10.0 2000.0