Search code examples
pythonexcelstringtextextract

Python extract unit of measure and attached number from column of string in Excel


I have to extract for each string in over 1000 of rows, the unit of measure and the associated number, like:

25 kg, 1000L

The string looks like this:

ZALTATA MA177-445 IBCC 1000L

The number is before the measurement unit.


Solution

  • Assuming this example input:

                                  col
    0    ZALTATA MA177-445 IBCC 1000L
    1  ZALTATA MA177 445 kg IBCC 1000
    

    you could use extract:

    import re
    df['col'].str.extract(r'\b(?P<value>\d+)\s*(?P<unit>kg|l)\b', flags=re.I)
    

    output:

      value unit
    0  1000    L
    1   445   kg
    

    join and save

    (df.join(df['col'].str.extract(r'\b(?P<value>\d+)\s*(?P<unit>kg|l)\b', flags=re.I))
       .to_excel('out.xslx')
    )