Search code examples
databaseentity-relationship

How to represent data with meaning dependent on other columns?


This question is about relational databases like postgresql or oracle. The following is a toy example of my problem. Say I have five tables

Client (id, name)
1 Joe
2 Ted

Factory (id, name, salary)
1 BMW     20
2 Porsche 30

Farm (id, name, salary)
1 Wineyard    10
2 Cattle farm  5

Occupation
1 Farmer
2 Worker

Client_Occupation (client-id, occupation-id, dependent-id)
1 (Joe) 1 (Farmer) 1 (Wineyard)
2 (Ted) 2 (Worker) 2 (Porsche)

To find out how much Joe earns, the sql needs to use data from either the table farms (if the occupation id is 1) or factories (if the occupation id is 2). This is creating very convoluted queries and makes the processing code more complex than it should be.

Is there a better way to structure this data if I do not want to merge the factories and farms tables?


In other words: the table client_ocupation has a conditional relation to either Farm or Factory. Is there a better way to represent this information?

                 Client
                   ^
                   |
Factory <-  Client_Occupation -> Farm
                   |
                   v
               Occupation

Solution

  • This comes up often when modeling OO hierarchies, and is a table inheritance pattern. Generally, I prefer Concrete Table Inheritance over Single Table Inheritance, but they both have their use cases.

    Single Table Inheritance is really easy - just take your Farm and Factory tables and merge them together for a superset. Done and done. Unfortunately, those niceties things like constraints become difficult, and you'll find yourself writing a lot of CASE expressions on the discriminator column.

    Concrete Table Inheritance takes a little time to grok, but is actually pretty simple as well. You design a base table w/common attributes and a discriminator column, and then "sub" tables w/specific attributes for each level of the hierarchy. The sub tables are linked back to the base table by Id and Discriminator - which provides some query optimizer gains and prevents a single entity from being multiple types.

    For your example problem, you really don't have any specific attributes on the leaves - so this will look a little odd, but the pattern still works (note that I'm renaming "dependent-id" to "location-id" to make it more understandable):

    Occupation (id, name)
    1 Farmer
    2 Worker
    
    # the base table for all locations w/common attributes
    Location (id, type, name, salary)
    # unique constraint on discriminator column, and another on the combination of id and type so that we can reference in a foreign key
    PK(id), UC(type), UC(id, type)
    1 Farm Wineyard 10
    2 Farm Cattle farm 5
    3 Factory BMW 20
    4 Factory Porsche 30
    
    Client_Occupation (client-id, occupation-id, location-id)
    FK location-id => location.id
    1 (Joe) 1 (Farmer) 3 (Wineyard)
    2 (Ted) 2 (Worker) 1 (Porsche)
    
    Farm
    # carry the discriminator column and check constraint it; any Farm specific columns can be added here
    PK(id), FK(id, type) => location(id, type), CHECK type = 'Farm'
    Factory (id, type)
    1 Farm
    2 Farm
    
    Factory
    # carry the discriminator column and check constraint it; any Factory specific columns can be added here
    PK(id), FK(id, type) => location(id, type), CHECK type = 'Factory'
    Factory (id, type)
    3 Factory
    4 Factory
    

    You can now easily get salary for everything applicable:

    SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
    

    And, if you want to get specific Farm columns, or limit it to Farm locations:

    SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
    JOIN Farm F ON CO.id = F.id and CO.type = F.type // technically, joining on type is unnecessary, but I prefer to include it anyway
    

    And, if you want to emulate the Single Table inheritance model:

    SELECT * FROM Client_Occupation CO JOIN Location L ON CO.location-id = L.id
    LEFT OUTER JOIN Farm F ON CO.id = F.id and CO.type = F.type 
    LEFT OUTER JOIN Factory T ON CO.id = T.id and CO.type = T.type 
    

    On occasion, it's difficult to refactor an existing table design into this pattern. In that case, a view for your base table provides a lot of the same querying benefits:

    CREATE VIEW Location AS
       SELECT id, 'Farm' as type, name, salary FROM Farm
       UNION ALL
       SELECT id, 'Factory' as type, name, salary FROM Factory
    

    At that point, you will (probably) have conflicting ids (eg., a FarmId = 1 and FactoryId = 1) so you'll need to be diligent about including the type column in any joins.