Search code examples
pandastype-conversion

How to convert strings of different units to floats of the same?


I have a panda's dataframe with strings, but I want floats. The units on the strings are different though. One column might be in mm with a few in cm. The next column might be g with a few in kg, and so on.

How can I convert each to floats in the same units? (i.e. if I'm changing everything to floats in mm, then "4.56 cm" should convert to 45.6, not 4.56).


Solution

  • Example

    we need minimal and reproducible example to answer.

    import pandas as pd
    import numpy as np
    
    data = {'length': ['4.56 cm', '100 mm', '3.5 cm', '10 mm'], 
            'weight': ['200 g', '0.5 kg', '300 g', '2 kg']}
    df = pd.DataFrame(data)
    

    df:

        length  weight
    0  4.56 cm   200 g
    1   100 mm  0.5 kg
    2   3.5 cm   300 g
    3    10 mm    2 kg
    

    Code

    I have prepared an answer with a example, but if your situation is different, please create and provide your own clear example.

    s1 = df['length'].str.replace('[A-Za-z]', '', regex=True).astype('float')
    df['length(mm)'] = np.where(df['length'].str.contains('cm'), s1 * 10, s1)
    
    s2 = df['weight'].str.replace('[A-Za-z]', '', regex=True).astype('float')
    df['weight(g)'] = np.where(df['weight'].str.contains('kg'), s2 * 1000, s2)
    

    df:

        length  weight  length(mm)  weight(g)
    0  4.56 cm   200 g        45.6      200.0
    1   100 mm  0.5 kg       100.0      500.0
    2   3.5 cm   300 g        35.0      300.0
    3    10 mm    2 kg        10.0     2000.0