Search code examples
mysqlsqlentity-attribute-value

Multiple possible data types for the same attribute: null entries, EAV, or store as varchar?


I'm creating a database for combustion experiments. Each experiment has some scientific metadata which I call 'details'. For example ('Fuel', 'C2H6') or ('Pressure', 120). Because the same detail names (like 'Fuel') show up a lot, I created a table just to store the names and units. Here's a simplified version:

CREATE TABLE properties (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    units NVARCHAR(15) NOT NULL DEFAULT 'dimensionless',
);

I also created a table called 'details' which maps 'properties' to values.

CREATE TABLE details (
    id INT AUTO_INCREMENT PRIMARY KEY,
    property_id INT NOT NULL,
    value VARCHAR(30),
    FOREIGN KEY(property_id) REFERENCES properties(id)
);

This isn't ideal because the value attribute is sometimes a chemical name and sometimes a float. In the future, there may even be new entries that have integer values. Storing everything in a VARCHAR seems wasteful. Since it'll be hard to change later, I want to make the right decision now.

I've been researching this for hours and have considered four options:

  1. Store everything as varchar under value (simplest to develop)
  2. Use an EAV model (most complicated to develop).
  3. Create a column for each type, and have plenty of NULL entries. value_float, value_int, value_char
  4. Use the JSON datatype.

Looking into each one, it seems like they're all bad in different ways. (1) is bad since it takes up extra space and I have to do extra operations to parse strings into numeric values. (2) is bad because of the huge increase in complexity (four extra tables and a lot more join operations), plus I hear EAV is to be avoided. (3) is a middle-ground in complexity, but there will be two NULL values for each table entry. (4) seems similar to (1), and I'm not sure how it might be better or worse.

I don't expect to have huge growth on this database or millions of entries. It just needs to be fast and searchable for researchers. I'm willing to have more backend complexity for a better/faster user experience.

By now I realize that there aren't that many clear-cut answers in database design. I'm simply asking for some insight into my three options, or perhaps another option I haven't thought of.

EDIT: Added JSON as an option.


Solution

  • Well, you have to sacrify something. Either HD space, or performance, or specific/general dimension or easy/complex to develop dimension. Choose a mix suitable for your needs and situation. - I solved it in 2000 in a general kind of EAV solution this way: basic record had a common properties shared by majority of events, then joins to properties without values (associative table), and those ones very specific properties/values I stored in a BLOB in XML like tags. This way I combined frequent properties with those very specific ones. AS this was intended as VERY GENERAL solution, you probably don't need, I'd sacrifice space, it's cheap today. Who cares if you take more space than it's "correct according to data modeling theory". Ok data model will be ugly, so what ? - You'll still need to decide on specific/general dimension - how specific attributes will be solved - either as specific columns (yes if they are repeated often) or in Property-TypeOfProperty-Value type of table.