Search code examples
mysqldatabasedatabase-designnormalizationdatabase-normalization

How to model attribute units in a database design?


I need to design a database table where most attributes have units. For example:

Readings
--------

id   load (kW)   fuel_consumption (tonnes) - etc
1    1154        89.4
2    1199        54.2

What's the recommended way to capture the units in the design? For example, I could:

  • store units within attribute names e.g. load_kW and fuel_consumption_tonnes
  • store units in a separate table e.g. each value becomes a foreign key to another table with columns for value and unit.
  • store outside the database: e.g. in business logic, or in documentation
  • are there others?

I happen to be using MySQL, but I assume this is a generic database normalisation problem.


Solution

  • Interesting question...

    There are two obvious routes:

    id   load_kW     fuel_consumption_tonnes
    --------------------------------------------------
    1    1154        89.4
    2    1199        54.2
    

    This is easy for humans to read, and fairly logical. However, if some readings are in "kilos", others in "tonnes", you have to convert those readings to fit into the "readings" table; this process MUST be "lossless", and idempotent. For instance, a reading of "89403 kilos" is not "89.4 tonnes", even though the business may choose to round from kilos to tonnes for convenience. There are usually some counter-intuitive rounding things that happen...

    If that's the case, you could change the schema:

    id      load load_unit    fuel_consumption fuel_consumption_unit
    --------------------------------------------------
    1    1154  kW          89403              kg
    2    1199  kW          54.2               t
    

    With a "unit" table, if you need it:

    unit_id    unit_name
    --------------------
    kg         kilogramme
    t          Tonne
    

    However, this model is open to human failure - it would be easy to change the "load_unit" column without modifying the "load" column, thus breaking the data. There's nothing you can really do to your data model to avoid this. It also makes common queries fairly tricky: imagine trying to retrieve the total of "load" in a consistent unit of measurement.

    I would recommend that in this case, you have two tables: "raw_readings", with the original data in the format above, and "normalized_readings", which you populate by converting all the readings to a consistent unit of measurement.