Search code examples
pythonpandasunits-of-measurementcustom-formatting

How can I manage units in pandas data?


I'm trying to figure out if there is a good way to manage units in my pandas data. For example, I have a DataFrame that looks like this:

   length (m)  width (m)  thickness (cm)
0         1.2        3.4             5.6
1         7.8        9.0             1.2
2         3.4        5.6             7.8

Currently, the measurement units are encoded in column names. Downsides include:

  1. column selection is awkward -- df['width (m)'] vs. df['width']
  2. things will likely break if the units of my source data change

If I wanted to strip the units out of the column names, is there somewhere else that the information could be stored?


Solution

  • There isn't any great way to do this right now, see github issue here for some discussion.

    As a quick hack, could do something like this, maintaining a separate dict with the units.

    In [3]: units = {}
    
    In [5]: newcols = []
       ...: for col in df:
       ...:     name, unit = col.split(' ')
       ...:     units[name] = unit
       ...:     newcols.append(name)
    
    In [6]: df.columns = newcols
    
    In [7]: df
    Out[7]:
       length  width  thickness
    0     1.2    3.4        5.6
    1     7.8    9.0        1.2
    2     3.4    5.6        7.8
    
    In [8]: units['length']
    Out[8]: '(m)'