I've got some data I'm reading into Python using Pandas and want to keep track of units with the Pint package. The values have a range of scales, so have mixed units, e.g. lengths are mostly meters but some are centimeters.
For example the data:
what,length
foo,5.3 m
bar,72 cm
and I'd like to end up with the length column in some form that Pint understands. Pint's Pandas integration suggests that it only supports the whole column having the same datatype, which seems reasonable. I'm happy with some arbitrary unit being picked (e.g. the first, most common, or just SI base unit) and everything expressed in terms of that.
I was expecting some nice way of getting from the data I have to what's expected, but I don't see anything.
import pandas as pd
import pint_pandas
length = pd.Series(['5.3 m', "72 cm"], dtype='pint[m]')
Doesn't do the correct thing at all, for example:
length * 2
outputs
0 5.3 m5.3 m
1 72 cm72 cm
dtype: pint[meter]
so it's just leaving things as strings. Calling length.pint.convert_object_dtype()
doesn't help and everything stays as strings.
Going through the examples, it looks like pint_pandas
is expecting numbers rather than strings. You can use apply
to do the conversion:
from pint import UnitRegistry
ureg = UnitRegistry()
df["length"].apply(lambda i: ureg(i)).astype("pint[m]")
However, why keep the column as Quantity
objects instead of just plain float numbers?