Search code examples

How to store data with dynamic number of attributes in a database

I have a number of different objects with a varying number of attributes. Until now I have saved the data in XML files which easily allow for an ever changing number of attributes. But I am trying to move it to a database.

What would be your preferred way to store this data?

A few strategies I have identified so far:

  • Having one single field named "attributes" in the object's table and store the data serialized or json'ed in there.
  • Storing the data in two tables (objects, attributes) and using a third to save the relations, making it a true n:m relation. Very clean solution, but possibly very expensive to fetch an entire object and all its attributes
  • Identifying attributes all objects have in common and creating fields for these to the object's table. Store the remaining attributes as serialized data in another field. This has an advantage over the first strategy, making searches easier.

Any ideas?


  • If you ever plan on searching for specific attributes, it's a bad idea to serialize them into a single column, since you'll have to use per-row functions to get the information out - this rarely scales well.

    I would opt for your second choice. Have a list of attributes in an attribute table, the objects in their own table, and a many-to-many relationship table called object attributes.

    For example:

        object_id    integer
        object_name  varchar(20)
        primary key  (object_id)
        attr_id      integer
        attr_name    varchar(20)
        primary key  (attr_id)
        object_id    integer  references (objects.object_id)
        attr_id      integer  references (attributes.attr_id)
        oa_value     varchar(20)
        primary key (object_id,attr_id)

    Your concern about performance is noted but, in my experience, it's always more costly to split a column than to combine multiple columns. If it turns out that there are performance problems, it's perfectly acceptable to break 3NF for performance reasons.

    In that case I would store it the same way but also have a column with the raw serialized data. Provided you use insert/update triggers to keep the columnar and combined data in sync, you won't have any problems. But you shouldn't worry about that until an actual problem surfaces.

    By using those triggers, you minimize the work required to only when the data changes. By trying to extract sub-column information, you do unnecessary work on every select.