Search code examples
sqlsql-serverdatabase-designschemaentity-attribute-value

How do you model custom attributes of entities?


Let's say we're having an application which should be able to store all kind of products. Each product has at least an ID and a Name but all other attributes can be defined by the user himself.

  1. E.g. He could create a productgroup Ipods which would contain attributes capacity and generation
  2. E.g. He could create a productgroup TShirts with the attributes size and color
  3. We need to store the definition of a product and the concrete products itself.
  4. We want to ensure that it is easily possible to aggregate (GROUP BY) by product attributes. E.g. select the total sum of capacity for each generation of ipods
  5. The solution must not require schema changes (added requirement due to input from Bill Karwin - see his answer as well!)

How would you model your schema in respect to the above requirements?

Note: Requirment 4. is important!

Thanks everyone for contributing and discussing the approach. I have seen some solutions to this problem in the past but none of them made grouping easy for me :(


Solution

  • The grouping is not going to be easy because what aggregate operator are you going to use on "color"? Note that it is not possible to use your requirement 4 on case 2.

    In any case, the aggregating is only difficult because of the variation in data types and can be mitigated by approaching it in a more typesafe way - knowing that it never makes sense to add apples and oranges.

    This is the classic EAV model and it has a place in databases where carefully designed. In order to make it a bit more typesafe, I've seen cases where the values are stored in type-safe tables instead of in a single free form varchar column.

    Instead of Values:

    EntityID int
    ,AttributeID int
    ,Value varchar(255)
    

    You have multiple tables:

    EntityID int
    ,AttributeID int
    ,ValueMoney money
    
    EntityID int
    ,AttributeID int
    ,ValueInt int
    
    etc.
    

    Then to get your iPod capacity per generation:

    SELECT vG.ValueVarChar AS Generation, SUM(vC.ValueDecimal) AS TotalCapacity
    FROM Products AS p
    INNER JOIN Attributes AS aG
        ON aG.AttributeName = 'generation'
    INNER JOIN ValueVarChar AS vG
        ON vG.EntityID = p.ProductID
        AND vG.AttributeID = aG.AttributeID
    INNER JOIN Attributes AS aC
        ON aC.AttributeName = 'capacity'
    INNER JOIN ValueDecimal AS vC
        ON vC.EntityID = p.ProductID
        AND vC.AttributeID = aC.AttributeID
    GROUP BY vG.ValueVarChar